Identiﬁcation and Extracting Method of Exterior Building Information on 3D Map

: Although the Korean government has provided high-quality architectural building information for a long period of time, its focus on administrative details over three-dimensional (3D) architectural mapping and data collection has hindered progress. This study presents a basic method for extracting exterior building information for the purpose of 3D mapping using deep learning and digital image processing. The method identiﬁes and classiﬁes objects by using the fast regional convolutional neural network model. The results show an accuracy of 93% in the detection of façade and 91% window detection; this could be further improved by more clearly deﬁning the boundaries of windows and reducing data noise. The additional metadata provided by the proposed method could, in the future, be included in building information modeling databases to facilitate structural analyses or reconstruction efforts.


Introduction
The Korean government implemented the "Act on the Promotion of the Provision and Use of Public Data" in 2013, which discloses data owned by the national and local governments to the public for the purpose of economic development and quality-of-life improvements [1]. Information owned by government institutions is provided through the Public Data Portal, which, as of Jan 2022, includes 107 national core datasets and 68,323 datasets. The datasets of both quantity and quality are continuously accumulating, and the number of successful business projects that use this public information is also increasing [2]. For urban architecture, 3785 datasets of administrative information are provided, including those for buildings, houses, land, the purpose of using, urban planning and regeneration, transportation, real estate, infrastructure, IoT sensing, spatial information, safety, energy use, etc.; furthermore, the government continues to expand and upgrade the scope of the data collected. Among them, Architectural Administration Information has disclosed 280 million cases of data since 2015 [3][4][5].
However, compared to other fields, the demand for information related to architecture is limited because such information consists primarily of administrative information. Although the government is making efforts to promote architectural administrative information, such as providing web services, building a cloud-based system, and building a big data hub, there is still not much demand, and there are no successful cases [3,4,6,7]. It is because architectural administration information primarily comprises licensing information, building ledgers, closure and cancellation ledgers, life cycle management and energy use; information such as building location, maintenance, accreditation and permission details, date, size (e.g., area and number of floors, etc.), type (e.g., use of purpose, structure, etc.), blueprint (e.g., layout drawing that does not violate security), interested parties, and energy (e.g., usage, grade, etc.) is limited for the private use [3,4,7].
For the utilization of architectural administrative information, data expansion through connection with private or other public data can be considered. However, the data that can be considered the most is the exterior building information that directly affects human visual perception. Modern architects such as Le Corbusier and theorists such as Jenks have mentioned the importance of visual information on the urban environment. [8,9]. In addition, visual information has great potential for use because it directly relates to public information, such as aesthetics of buildings, legal regulations, safety, security, regional characteristics, and city images and landscapes. In this way, if the exterior building information is linked with the architectural administration information, the building information's scalability and completeness increase, as does the utility value in the private and public fields.
This point of view is the same as the data utilization policy of the Korean government and the direction of the BIM (building information modeling) policy [10,11]. In July 2020, the government announced the Digital New Deal policy for digital transformation centered on D.N.A. (data, network, AI) [10], in particular, in the construction field, the opening of administrative information and BIM-centered data transformation in the fields of design, construction, and building management [10]. The government announced the Architectural BIM Activation Roadmap (December 2020) and the Roadmap for Digital Transformation of the Construction Industry (June 2021) to expand the application of BIM, and BIM application will be mandatory in all construction industries from 2025 [11]. BIM is a design technology based on a 3D (three-dimensional) shape model and it contains various attributes [12]. Because it has a higher degree of completeness than the existing 2D drawings, it is widely applied to the entire construction process of architectural planning, design, construction, and maintenance [12]. It is also a standard information platform that enables collaboration and communication in various fields [13]. High-quality BIM information can secure economic feasibility, time savings, work efficiency, productivity, and safety in the construction industry [14,15]. In addition, it can promote the development of the construction industry by combining it with the fourth industrial technology, such as AI, big data, digital twin, and automation design and construction [16].
However, since BIM-applied building design or construction only applies to new buildings, there are very few cases where BIM is applied to all buildings in Korea. As of 2021, BIM design is compulsory only for large-scale public projects of more than USD 25 million ordered by the government, and although the scope of application is gradually expanding, it is expected that it will take time for the entire building to be transformed into BIM. However, even if it takes time according to government policies or social changes, most existing buildings will eventually be digitally converted to BIM. In order to respond to these changes, it is necessary to build information gradually by linking attribute information such as building shape, elevation components, and materials based on architectural administrative information constructed as a dataset. First, building exterior information related to shape information should be linked and constructed as a dataset. In particular, since it takes much money, time, and effort to deal with all the buildings in the country one by one, an effective method is needed to overcome it. Therefore, in this study, images of buildings that have already been built in the private or public are used. In recent years, Korean web sites have improved the quality and quantity of urban data with architectural image information, urban exteriors, street views, and other 3D (three-dimensional) maps. Although the resolution of these data is lower than that of photos taken on site, the potential for future use is very high because of the vast amount of information built nationwide, and the resolution is increasing every year. Studies from various perspectives identify elements using city images provided on the web. For example, No et al. proposed a deep learning algorithm that detects the characteristic information of Braille blocks on the road using street view images to diagnose a space with insufficient installation [17]. Seiferling et al. (2017) presented a method to detect and quantify the condition of street trees through Street View images for tree management [18]. Yu et al. (2019) tried to build a BIM database after extracting visual information of buildings using Street View and two-dimensional aerial images to identify buildings vulnerable to earthquakes [19]. In structure and construction, images are sometimes used from the viewpoint of construction quality or safety. For example, Santarsiero et al. (2021) presented a method of collecting bridge survey information from bridge images through OpenStreetMap (OSM) and Street View to suggest a bridge risk classification procedure [20].
Unlike other previous studies, this study uses 3D Map to extract building exterior information. This is because image distortion occurs in the Street View image depending on the shooting point, and buildings are sometimes obscured by plantings or automobiles. However, the 3D Map has no obstacles to fully identifying the exterior building information. This is because 3D maps are created by the photogrammetry technique that creates 3D information through overlapping objects after multi-angle aerial photography. However, there are limits to extracting all exterior building information nationwide by human resources. As computer vision (CV) and deep learning technologies advance, time, effort, and cost can be saved.

3D Maps
National and international private IT corporations, such as Google, Daum (Kakao), and Naver, provide both two-dimensional (2D) and 3D map services-including street views, aerial satellite photos, and spatial building information-that show the exteriors of buildings [21][22][23][24][25]. Three-dimensional maps provide users with a more directly immersive urban experience, and the continuous improvement of autonomous drones and vehicular cameras help to improve the resolution and sophistication of 3D maps. These maps differ from 2D satellite maps: 3D mapping is a method of photographing the exteriors of architectural structures from several angles using drones or aircraft, then synthesizing the photographed images in three dimensions based on coordinate values to create a map (i.e., photogrammetry). The national government employed the spatial information open platform "Vworld" to provide 2D and 3D spatial information (i.e., physical and logical spaces, as well as properties belonging to figures) to connected public administration databases and is currently developing a 3D shape model based on architectural building forms and floor heights [21]. In particular, the Seoul Metropolitan Government has constructed a 3D map (S-Map) of the entire 605 km 2 of Seoul that features greater location accuracy and higher resolution than government or private maps [22]; furthermore, images generated by S-Map are easily extracted with no obstacles covering the exterior of the buildings (See Figure 1).
Owing to the rapid development of 3D maps, users are provided with convenient and detailed building information, but these are currently used merely as a visual reference. We propose that composing a dataset for exterior building architectural information will contribute to future research and reconstruction efforts.

Advances in Image Processing and Deep Learning Technology
CV technology is the current industry standard for image processing. Rapidly improved in the 2000s, CV is primarily used for the identification, division, tracking, mapping, estimation, and exploration of objects [26][27][28]. Based on knowledge-based methodology, it primarily focuses on feature extraction methods via the detection of contours, corners, and key points within the image. CV's application is divided into image processing (e.g., removing noise and emphasizing features) and background analysis (e.g., extracting or identifying information) [29]. Imaging processing technologies based on CV were able to detect objects but suffered from an inability to properly classify those objects.

Advances in Image Processing and Deep Learning Technology
CV technology is the current industry standard for image processing. Rapidly improved in the 2000s, CV is primarily used for the identification, division, tracking, mapping, estimation, and exploration of objects [26][27][28]. Based on knowledge-based methodology, it primarily focuses on feature extraction methods via the detection of contours, corners, and key points within the image. CV's application is divided into image processing (e.g., removing noise and emphasizing features) and background analysis (e.g., extracting or identifying information) [29]. Imaging processing technologies based on CV were able to detect objects but suffered from an inability to properly classify those objects.
After 2010, image processing research developed to a practical utility stage in many fields, such as autonomous cars, fingerprinting, facial recognition, and robot control (with the rapid development of artificial intelligence) [28]. It was considered a significant milestone when a technique for classifying the extracted features was developed. Meanwhile, the importance of data preprocessing and conversion processes such as spatial domain conversion, geometric conversion, and frequency conversion have diminished since the early 2000s, supplanted by the detection and classification of objects and motion detection made possible through image inputs. Based on CNNs (convolutional neuron networks), deep learning technology for image processing is being developed into platforms such as YOLO (You Only Look Once) and Detectron based on several object detection models such as fast and faster-R. These platforms also provide pretrained models and high-capacity image data for commonly used objects such as people, vehicles, chairs, and buildings.
Deep-learning-based image identification technology is used in many fields: in medical fields, these processing techniques are used to analyze chest X-ray images, MRI images, and ophthalmic images, and to develop software that quickly and accurately diagnoses various diseases and aids in effective treatment. Google developed a system that can diagnose diabetic retinopathy using CNN technique-based retinal analysis, and Lunit INSIGHT developed a medical support software system for lung disease and breast cancer diagnosis [30,31]. In the fields of transportation and security, accidents are prevented by tracking and categorizing the movement of objects using closed-circuit television (CCTV) images; systems based on deep learning technology forecast the traffic volume of cars, taxis, bicycles, and other vehicles on roads, then create accident predictions by analyzing the movements of pedestrians and vehicles [26,32,33].
Prior to the application of deep-learning-based technology, any attempts to identify the exteriors of buildings were focused on the recognition of building openings. Several After 2010, image processing research developed to a practical utility stage in many fields, such as autonomous cars, fingerprinting, facial recognition, and robot control (with the rapid development of artificial intelligence) [28]. It was considered a significant milestone when a technique for classifying the extracted features was developed. Meanwhile, the importance of data preprocessing and conversion processes such as spatial domain conversion, geometric conversion, and frequency conversion have diminished since the early 2000s, supplanted by the detection and classification of objects and motion detection made possible through image inputs. Based on CNNs (convolutional neuron networks), deep learning technology for image processing is being developed into platforms such as YOLO (You Only Look Once) and Detectron based on several object detection models such as fast and faster-R. These platforms also provide pretrained models and high-capacity image data for commonly used objects such as people, vehicles, chairs, and buildings.
Deep-learning-based image identification technology is used in many fields: in medical fields, these processing techniques are used to analyze chest X-ray images, MRI images, and ophthalmic images, and to develop software that quickly and accurately diagnoses various diseases and aids in effective treatment. Google developed a system that can diagnose diabetic retinopathy using CNN technique-based retinal analysis, and Lunit INSIGHT developed a medical support software system for lung disease and breast cancer diagnosis [30,31]. In the fields of transportation and security, accidents are prevented by tracking and categorizing the movement of objects using closed-circuit television (CCTV) images; systems based on deep learning technology forecast the traffic volume of cars, taxis, bicycles, and other vehicles on roads, then create accident predictions by analyzing the movements of pedestrians and vehicles [26,32,33].
Prior to the application of deep-learning-based technology, any attempts to identify the exteriors of buildings were focused on the recognition of building openings. Several approaches were developed, such as deciding whether the opening is glass through RGB (red, green, and blue) color values in elevated areas, as well as detecting entrances and structural combinations using lines, colors, and textures [34][35][36]. This differs from studies prior to deep learning in that identification elements are categorized along with algorithms that extract features from the images; most of the studies primarily focus on extracting building outlines, façade characteristics, and outer wall materials [37][38][39]. These later studies are similar to previous studies in how they identify the components of the façade; however, they are limited to the identification of shapes, elements, and materials.
Our study presents a multifold method of identifying elements in high-rise buildings by appropriately distinguishing image identification and deep learning techniques on 3D Maps. Similar to other studies, it does not target high-resolution building images; there Buildings 2022, 12, 452 5 of 18 may have a limit to creating sophisticated results. However, it is meant as basic research because it has the advantage of being scalable to be applied nationwide.

Deep Neural Networks and Video Detection
Early neural networks suffered from many shortcomings, including slow learning time, data supply and demand problems, and overfitting; the eventual resolution of these issues led to the discovery of a hidden layer of existing artificial neural networks being developed into a deep neural network (DNN) comprising multiple layers [40]. As with conventional neural networks, DNNs comprise input layers, hidden layers, and output layers, and are able to model complex nonlinear relationships. DNNs have been recently developed into different forms, including recurrent neural networks (RNNs) in natural language processing and convolutional neural networks (CNNs) in computer vision: CNNs are primarily used for image identification as artificial neural networks to imitate visual processing methods, whereas RNNs were designed for the purpose of processing time series data such as audio, sensors, and characters [41,42].
Given their utility in image identification, CNNs are the basis for major deep learning platforms involving object detection in images such as the aforementioned YOLO and Detectron2. With Tensorflow, a Python-based deep learning open-source package provided by Google, research can be conducted on several data mining, machine learning, and deep learning models; it also facilitates the construction of large visualization models designed using data flow graphs via the processing power of GPUs (graphics processing units) and CPUs (central processing units). In addition to Tensorflow, platforms such as PyTorch and Caffe2 can be used. Since YOLO is a deep learning model that specializes in detecting objects in images, it improved the computational speed of existing region-based convolutional neural network (RCNN)-series models, allowing one network to extract features, create bounding boxes, and categorize classes simultaneously [43].

Identification and Extracting Exterior Building Information
For our study, we selected a neighborhood in Jongno-gu, Seoul, as the case subject site for the identification of building exteriors and information extraction, because it contains a high concentration of high-rise buildings; any buildings that met the criteria were categorized as "high-rise buildings" (See Figure 2). This is because the images of high-rise buildings are high-resolution, and because it is necessary to establish a model which identifies components (e.g., location, area, shape, texture, material, color of window, entrance, pillar, etc.) within an architectural type of high-rise building that allows a framework to be established for application to other building types as well. Here, the model for detecting building façades is proposed by applying deep learning technology. The model largely comprises (1) an image extraction unit, (2) an image preprocessing unit, (3) a building façade recognition unit, and (4) a data management unit. The model structure is illustrated in Figure 3.
First, a 3D building map image is used to identify the building exteriors and extract façade information. The image preprocessing unit sets the region of interest (RoI) on the façade of the building to train the deep learning model from the acquired image. The building façade recognition unit detects the building exteriors using a deep learning model wherein the images of building exteriors are learned, and extracts building façade information from the building that was identified. Finally, the identified information related to building façades is added to an integrated database for building management. We propose that this database be used in conjunction with the building ledger information database to provide better datasets for buildings. Buildings 2022, 12, x FOR PEER REVIEW 6 of 18  First, a 3D building map image is used to identify the building exteriors and extract façade information. The image preprocessing unit sets the region of interest (RoI) on the façade of the building to train the deep learning model from the acquired image. The building façade recognition unit detects the building exteriors using a deep learning model wherein the images of building exteriors are learned, and extracts building façade information from the building that was identified. Finally, the identified information related to building façades is added to an integrated database for building management. We propose that this database be used in conjunction with the building ledger information database to provide better datasets for buildings.

Method Used to Acquire Building 3D Map Image Data
This section describes our method for acquiring images of building exteriors from 3D building maps. First, we used 3D building maps based on the information provided by S-Map [22]. S-Map provides user-friendly 3D-type building maps and enables users to set several layers, including administrative divisions, a state basic district system, road name information, subway information, and building names, all at different resolutions. Additionally, it provides the user with the ability to set altitude and tilt angle settings (see Figure 1).
S-Map supports its own application programming interface (API), but no third-party APIs for the extraction of 3D building images such as those we intend to use. Thus, we implemented a macro program to automatically move and capture the screen via AutoIt  First, a 3D building map image is used to identify the building exteriors and extract façade information. The image preprocessing unit sets the region of interest (RoI) on the façade of the building to train the deep learning model from the acquired image. The building façade recognition unit detects the building exteriors using a deep learning model wherein the images of building exteriors are learned, and extracts building façade information from the building that was identified. Finally, the identified information related to building façades is added to an integrated database for building management. We propose that this database be used in conjunction with the building ledger information database to provide better datasets for buildings.

Method Used to Acquire Building 3D Map Image Data
This section describes our method for acquiring images of building exteriors from 3D building maps. First, we used 3D building maps based on the information provided by S-Map [22]. S-Map provides user-friendly 3D-type building maps and enables users to set several layers, including administrative divisions, a state basic district system, road name information, subway information, and building names, all at different resolutions. Additionally, it provides the user with the ability to set altitude and tilt angle settings (see Figure 1).
S-Map supports its own application programming interface (API), but no third-party APIs for the extraction of 3D building images such as those we intend to use. Thus, we implemented a macro program to automatically move and capture the screen via AutoIt

Method Used to Acquire Building 3D Map Image Data
This section describes our method for acquiring images of building exteriors from 3D building maps. First, we used 3D building maps based on the information provided by S-Map [22]. S-Map provides user-friendly 3D-type building maps and enables users to set several layers, including administrative divisions, a state basic district system, road name information, subway information, and building names, all at different resolutions. Additionally, it provides the user with the ability to set altitude and tilt angle settings (see Figure 1).
S-Map supports its own application programming interface (API), but no third-party APIs for the extraction of 3D building images such as those we intend to use. Thus, we implemented a macro program to automatically move and capture the screen via AutoIt software to acquire a full image of a 3D building [44]. It is trivial to perform these repetitive tasks by automating the mouse and keyboard movements; this software was used to extract approximately 500 façade images of buildings in Jongno-gu, Seoul. Because distortion occurs as façade increases, we focused on the front of each building as much as possible and set the extraction altitude and inclination angle to 114 m and 14 • , respectively.

Preprocessing Method for Training the Deep Learning Model
This section describes our image processing method for training the deep learning model using the 3D building image acquired by the image extraction unit in the previous section. Generally, when a deep learning model is being trained to detect objects, coordinates and images for a section corresponding to an object area within an image-called a bounding box-are used as well. The bounding box for each building image is rectangular and is shown in the form of left upper coordinates (minx, miny) and right lower coordinates (maxx, maxy). Figure 4 illustrates an example of displaying an object area in an image as a bounding box.
nates and images for a section corresponding to an object area within an image-called a bounding box-are used as well. The bounding box for each building image is rectangular and is shown in the form of left upper coordinates (minx, miny) and right lower coordinates (maxx, maxy). Figure 4 illustrates an example of displaying an object area in an image as a bounding box.
We set a bounding box for the building areas from the extracted image information to train the deep learning building detection model. The settings for the bounding box were set manually on all images for the model learning dataset as well as the test dataset, and the VGG Image Annotator (VIA) open-source software [45] was utilized to extract bounding box coordinates. When training the deep learning model, images are required for learning, and it is crucial to include metadata or annotations containing object information in the images. Metadata contain information including image paths, sizes, and bounding box coordinates and are stored in either JavaScript Object Notation (JSON) or eXtensible Markup Language (XML) format; VIA supports object ranging and metadata extraction for training the model. Figure 4 presents the result of setting bounding boxes in 3D building images using the VIA software. In this study, a bounding box was set up to recognize the façade and windows of the building from the 3D building images. Image metadata in this study were extracted in JSON format, the general structure of which is shown in Table 1.

Building Façade Recognition Unit
This section describes one method for training a deep learning model and one for recognizing the façade of buildings using preprocessed image information and metadata. It also partially illustrates how various building information can be acquired using image processing techniques from the recognized building façade.

Deep-Learning-Model-Based Method to Recognize Building Façade
Prior to the development of deep learning models, object detection technology was based on in-image algorithms such as the scale-invariant feature transform (SIFT) and sped-up robust features (SURF), or on machine extraction algorithms including K-nearest neighbor (KNN) and support vector machine (SVM) to extract the main features in the image using mathematical estimates. The speed and accuracy of detecting objects in images has improved in conjunction with advances in deep learning technology. The major algorithms for object detection in images are as follows: DetectorNet [46], SPP Net [47], VGG Net [48], R-CNN [32], fast/faster R-CNN [49][50][51], YOLO [52][53][54][55], and mask R-CNN [56].

Building Façade Recognition Unit
This section describes one method for training a deep learning model and one for recognizing the façade of buildings using preprocessed image information and metadata. It also partially illustrates how various building information can be acquired using image processing techniques from the recognized building façade.

Deep-Learning-Model-Based Method to Recognize Building Façade
Prior to the development of deep learning models, object detection technology was based on in-image algorithms such as the scale-invariant feature transform (SIFT) and sped-up robust features (SURF), or on machine extraction algorithms including K-nearest neighbor (KNN) and support vector machine (SVM) to extract the main features in the image using mathematical estimates. The speed and accuracy of detecting objects in images has improved in conjunction with advances in deep learning technology. The major algorithms for object detection in images are as follows: DetectorNet [46], SPP Net [47], Scheme 1. An example of JSON including labeled object information.
In this study, 3D building images were learned by using faster R-CNN-an improved model of R-CNN-and then, these were employed for the detection and extraction of building façade information. The building images are not included in the COCO (Common Object in Context) dataset [57], so we needed to train the object detection model additionally.
In general, the faster R-CNN model extracts a "regional proposal" in the image using the selective search algorithm and then extracts features in the image using the CNN model in each candidate area. Classification is performed to categorize objects using SVM based on the extracted features. In the faster R-CNN that emerges, a separate regional proposal network (RPN) is applied to extract feature maps. Additionally, the location of the object can be extracted by performing a bounding box regression operation, which has higher MAP (mean average precision). The structure of faster R-CNN is illustrated in Figure 5.
using the selective search algorithm and then extracts features in the image using the CNN model in each candidate area. Classification is performed to categorize objects using SVM based on the extracted features. In the faster R-CNN that emerges, a separate regional proposal network (RPN) is applied to extract feature maps. Additionally, the location of the object can be extracted by performing a bounding box regression operation, which has higher MAP (mean average precision). The structure of faster R-CNN is illustrated in   When learning, label data in images (including images for learning, building façade bounding box information, and window bounding box information) are included in the JSON file for data configuration. In this study, the Detectron2 platform [58][59][60] provided by Facebook AI Research is used for training.
Using these methods, we were able to acquire a learning model wherein the deep learning model was finally up-to-date. Figure 6 illustrates the result of the 3D building image tested in the learned model. The results show that high-rise buildings are welldetected, but windows are not due to the distortion of the buildings and the overall image size. Section 4 outlines the results of the model learning and its performance.
JSON file for data configuration. In this study, the Detectron2 platform [58][59][60] provided by Facebook AI Research is used for training.
Using these methods, we were able to acquire a learning model wherein the deep learning model was finally up-to-date. Figure 6 illustrates the result of the 3D building image tested in the learned model. The results show that high-rise buildings are welldetected, but windows are not due to the distortion of the buildings and the overall image size. Section 4 outlines the results of the model learning and its performance.

Method for Extraction of Building Façade Information
This section outlines the method for extracting the façade information of a building using the image processing technique based on the results presented through the deep learning model. The exterior details of the building and its façade vary, and the 3D building exterior image used in this study suffers from low image quality and unconventional shape. Thus, the scope of this study was to distinguish windows from walls among the exterior and façade information of the building, the method for which is twofold: we first employ a simple classification method, then a detailed classification method.
In the simple classification method for windows, only the existence or absence of windows is identified based on the façade of the building. As previously mentioned, windows can also be recognized through training window areas during façade training. While window areas exist within the façade area of the building, they are sometimes detected outside the façade area due to noise in the image area (see Figure 7). To prevent this, our study compared the locations between the façade areas and the window areas of the building, thus improving detection accuracy and facilitating the acquisition of a set of window areas ( ) included in the building façade area ( ). The equation for comparison between the detected building façade area ( ) and the detected window area ( ) is as follows: where | · | is a size of area A, m is the number of detected windows, and n is the number of detected buildings.

Method for Extraction of Building Façade Information
This section outlines the method for extracting the façade information of a building using the image processing technique based on the results presented through the deep learning model. The exterior details of the building and its façade vary, and the 3D building exterior image used in this study suffers from low image quality and unconventional shape. Thus, the scope of this study was to distinguish windows from walls among the exterior and façade information of the building, the method for which is twofold: we first employ a simple classification method, then a detailed classification method.
In the simple classification method for windows, only the existence or absence of windows is identified based on the façade of the building. As previously mentioned, windows can also be recognized through training window areas during façade training. While window areas exist within the façade area of the building, they are sometimes detected outside the façade area due to noise in the image area (see Figure 7). To prevent this, our study compared the locations between the façade areas and the window areas of the building, thus improving detection accuracy and facilitating the acquisition of a set of window areas (w j ) included in the building façade area (B i ). The equation for comparison between the detected building façade area (B i ) and the detected window area (w j ) is as follows: where | · | is a size of area A, m is the number of detected windows, and n is the number of detected buildings.
When there is a window area within the façade area of the building, the building is categorized as having windows; otherwise, it is categorized as a building with no windows. This is used as part of the preprocessing in the detailed classification method that consists of two parts from which information is extracted: the type of façade windows and the ratio of façade windows. The classified contents for each detail are as follows:  When there is a window area within the façade area of the building, the building is categorized as having windows; otherwise, it is categorized as a building with no windows. This is used as part of the preprocessing in the detailed classification method that consists of two parts from which information is extracted: the type of façade windows and the ratio of façade windows. The classified contents for each detail are as follows: • Type of façade window: front curtain wall, repeated single windows, others (mixed windows, etc.).
Among the types of façade windows, the front curtain wall describes instances when the window covers the entire building, while single window repetition indicates when the window is repeated every floor in a consistent pattern. Finally, the "other" category includes mixed windows and half-height horizontal windows (see Figure 8).  Among the types of façade windows, the front curtain wall describes instances when the window covers the entire building, while single window repetition indicates when the window is repeated every floor in a consistent pattern. Finally, the "other" category includes mixed windows and half-height horizontal windows (see Figure 8). When there is a window area within the façade area of the building, the build categorized as having windows; otherwise, it is categorized as a building with no dows. This is used as part of the preprocessing in the detailed classification metho consists of two parts from which information is extracted: the type of façade window the ratio of façade windows. The classified contents for each detail are as follows: • Type of façade window: front curtain wall, repeated single windows, others (m windows, etc.).
Among the types of façade windows, the front curtain wall describes instances the window covers the entire building, while single window repetition indicates wh window is repeated every floor in a consistent pattern. Finally, the "other" catego cludes mixed windows and half-height horizontal windows (see Figure 8).  The distinction between the front curtain wall and the repeated single windows can be acquired by using the number of window areas recognized in the façade area of the building. Buildings in the "other" category refer to the form to which both factors can be applied (mixed windows). When the exterior information of buildings is extracted based on a deep learning model, the distinction of buildings and windows is manually defined in advance. Provided the size of the area between the building façade area (B i ) and the window area (w x ∈ W i ) within the building façade area is similar or equal, the building is classified as a front curtain wall building. If two or more window areas are detected, the building is classified as a repeated single window building. Buildings that are not classified as either type are classified as "other". The classification function, D, to detect the type of façade windows is as follows: |B i | − |w x | ≤ threshold; i = 1, 2, . . . , n; x = 1, 2, . . . , |W i |, w x ∈ W i repeated single windows, x ≥ 2 others, otherwise Next, to classify the façade window ratio, the ratio of the size of the window area to the size of the façade area of the building is employed. The window ratio is classified into four categories, namely,~25%,~50%,~75%, and~100%, and the window ratio (R) can be acquired using the following equation: Next, we employ a simple classification method for the walls using RGB channel values to distinguish colors in the image so the color of the wall is easily detected. First, RGB channel values in the façade area of the building were extracted, excluding the window areas, and colors were classified as black, maroon, silver, orange, green, and blue. The following RGB channel values for each color are defined as reference channel values: where r i (r), g i (g), b i b are the symbols of pixel value for red, green, and blue, respectively. The average RGB channel value of the corresponding building façade is where n refers to the number of pixels in the detection region. Figure 9 illustrates the RGB channel and the assigned color of each building façade.
Next, we utilize a detailed method to classify the walls by classifying the materials themselves (glass, cement, etc.) on the façade of the building. To recognize and classify these materials, they must be manually relabeled and classified by viewing the detected façade of the building in the image; we did not conduct a detailed classification of the walls separately. Nonetheless, given the results acquired after detecting the façade and windows, there is sufficient reason to consider that future classifications can be performed separately. In addition, materials that are not clearly distinguished with the naked eye (e.g., cement, bricks, and tiles) could be considered together, using the façade color classification method for buildings that is used in the simple classification method. Next, we utilize a detailed method to classify the walls by classifying the materia themselves (glass, cement, etc.) on the façade of the building. To recognize and classi these materials, they must be manually relabeled and classified by viewing the detecte façade of the building in the image; we did not conduct a detailed classification of th walls separately. Nonetheless, given the results acquired after detecting the façade an windows, there is sufficient reason to consider that future classifications can be performe separately. In addition, materials that are not clearly distinguished with the naked ey (e.g., cement, bricks, and tiles) could be considered together, using the façade color class fication method for buildings that is used in the simple classification method.

Experiment Design
This section briefly describes the experimental performance plan for identifying th façade of buildings to which deep learning technology was applied, as described in Se tion 3, and verifying the performance of the façade information extraction methodolog First, the experimental process and results for verifying the performance of the model a described. Next, a process for extracting building façade information and an experiment result is outlined, using an image processing technique from the extracted image.

Results of Building Façade Recognition Using the Deep Learning Model
This section explains the results of building façade recognition using the deep lear ing model. We used the faster R-CNN model in the Detectron2 environment to recogniz the façade of the building from the façade image. The total number of acquired imag was 554, and the approximate building façade and windows numbered, respectively, 80 and 900. Among the acquired images, approximately 388 (70%) images were used f learning, and approximately 166 (30%) images were used for the test. The building façad image used here was that of a high-rise building in Seoul. Regarding the hyperparamete for learning, the learning rate was set to 0.0005 and the maximum number of iteration was set to 5000 times. Additionally, by using the early stopping function of learning, th learning time was reduced since it halted when no further reduction in loss occurred o subsequent iterations; in our study, iteration occurred approximately 1200 times befo stopping. The graph in Figure 10 illustrates this point. The loss function utilized the ro mean squared error (RMSE), and it is crucial to note that the model learned from the d creasing trend of loss values on each iteration, as opposed to absolute values.

Experiment Design
This section briefly describes the experimental performance plan for identifying the façade of buildings to which deep learning technology was applied, as described in Section 3, and verifying the performance of the façade information extraction methodology. First, the experimental process and results for verifying the performance of the model are described. Next, a process for extracting building façade information and an experimental result is outlined, using an image processing technique from the extracted image.

Results of Building Façade Recognition Using the Deep Learning Model
This section explains the results of building façade recognition using the deep learning model. We used the faster R-CNN model in the Detectron2 environment to recognize the façade of the building from the façade image. The total number of acquired images was 554, and the approximate building façade and windows numbered, respectively, 800 and 900. Among the acquired images, approximately 388 (70%) images were used for learning, and approximately 166 (30%) images were used for the test. The building façade image used here was that of a high-rise building in Seoul. Regarding the hyperparameters for learning, the learning rate was set to 0.0005 and the maximum number of iterations was set to 5000 times. Additionally, by using the early stopping function of learning, the learning time was reduced since it halted when no further reduction in loss occurred on subsequent iterations; in our study, iteration occurred approximately 1200 times before stopping. The graph in Figure 10 illustrates this point. The loss function utilized the root mean squared error (RMSE), and it is crucial to note that the model learned from the decreasing trend of loss values on each iteration, as opposed to absolute values.
The result of the model's learning indicated that the total loss value was 0.03624. As an indicator for the evaluation of the learned model, accuracy was used. The method to derive it is as follows: The result of the model's learning indicated that the total loss value was 0.03624. As an indicator for the evaluation of the learned model, accuracy was used. The method to derive it is as follows:

=
The building façade detection accuracy and window detection accuracy were 93% and 91%, respectively.

Result of Extracting Building Façade Information Based on Image Processing Techniques
This section describes the results of extracting the building façade information using image processing techniques from the recognized building façade images while using a deep learning model. We divided building façade information into (1) the simple classification method for windows, (2) the detailed classification method for windows, and (3) simple classification for walls. The first part (simple classification method for windows) was already conducted in the previous section, so we handle the remaining parts in this section.
The detailed classification of windows focused on the types of façade windows (front curtain walls, repeated single windows, other) and the classification of façade window ratio (~25%, ~50%, ~75%, and ~100%). To verify the suggested method, 43 front curtain wall façade images, 31 repeated single window façade images, and 38 other images were used as test data. As a result, the accuracy in recognizing front curtain wall, repeated single windows, and other building types was 95% (41 images), 90% (28 images), and 100% (38 images), respectively.
Next, the simple classification for walls focused on the RGB channel value of the walls. After we removed the window areas from the façade areas prior to extracting the channel values, the average of the channel values of the remaining area was calculated. "Matching" was performed based on the channels, which were calculated according to six previously defined channels. As a result, all 30 images for each color were well matched. This shows that the color of the building can be distinguished when RGB channel values are used. This methodology can be used in preprocessing for the detailed classification method for walls in future studies. The building façade detection accuracy and window detection accuracy were 93% and 91%, respectively.

Result of Extracting Building Façade Information Based on Image Processing Techniques
This section describes the results of extracting the building façade information using image processing techniques from the recognized building façade images while using a deep learning model. We divided building façade information into (1) the simple classification method for windows, (2) the detailed classification method for windows, and (3) simple classification for walls. The first part (simple classification method for windows) was already conducted in the previous section, so we handle the remaining parts in this section.
The detailed classification of windows focused on the types of façade windows (front curtain walls, repeated single windows, other) and the classification of façade window ratio (~25%,~50%,~75%, and~100%). To verify the suggested method, 43 front curtain wall façade images, 31 repeated single window façade images, and 38 other images were used as test data. As a result, the accuracy in recognizing front curtain wall, repeated single windows, and other building types was 95% (41 images), 90% (28 images), and 100% (38 images), respectively.
Next, the simple classification for walls focused on the RGB channel value of the walls. After we removed the window areas from the façade areas prior to extracting the channel values, the average of the channel values of the remaining area was calculated. "Matching" was performed based on the channels, which were calculated according to six previously defined channels. As a result, all 30 images for each color were well matched. This shows that the color of the building can be distinguished when RGB channel values are used. This methodology can be used in preprocessing for the detailed classification method for walls in future studies.

Conclusions
This study aimed to detect the façade of a building using deep learning and image processing techniques to recognize and classify façade information such as wall colors and the window ratio of the building. The ways in which the study contributes to the field are threefold. First, a model was built to learn the façade and window images of the building using a 3D building façade map, which was never previously attempted. In this study, data were acquired by automating the 3D building façade map extraction process, whereas previous studies were conducted with actual images in the form of street views. Unlike street views that were photographed from limited angles, this model has the advantage of learning more sophisticated building façade images from several angles. Second, the current study acquired better detection accuracy using the latest object detection model. In previous studies, basic image processing techniques or deep learning models that are not optimized for object detection were used, whereas, in this study, faster R-CNN models optimized for object detection and segmentation within images were employed. Third, a methodology for recognizing and extracting information regarding the façade of buildings using image processing techniques was proposed in the current study. The proposed methodologies could be adopted in future studies to acquire fundamental building façade information automatically. Moreover, they can be used in preprocessing to recognize more detailed information regarding building façade.
As a result of detecting the exterior buildings and extracting information, the techniques were 93% and 91% accurate in correctly detecting the façade and windows, respectively, while showing excellent performance in experiments in which façade information was extracted. However, in the process of detecting façade and windows using deep learning techniques, the false detection rate during learning was higher than that of building façade detection due to unclear window boundaries. In fact, data noise is likely to be variable due to experimenter errors when generating learning data using VIA software. If window detection can be improved, then learning results will be more satisfactory.
In addition, the reliability of the accuracy may be reduced due to the small size of the experimental dataset when extracting building façade information based on processing techniques. Despite this, verifying performance with minimal data is possible since it is based on rules rather than the use of a learning model or statistical methodology. Moreover, this can be used in preprocessing for determining specific information such as the façade material of a building in the future.
This methodology can be used to collect exterior building information while saving time, cost, and effort on a national level. It may then be added to public building databases (e.g., those containing utility ledgers) to further increase value. In particular, exterior building information such as the beauty of urban architecture, regional characteristics, city images, or landscape can be constructed as a BIM dataset, and a "digital twin" can be implemented to build real-time information with the development of smart architecture and urban technology.