CNN Algorithm for Roof Detection and Material Classiﬁcation in Satellite Images

: This paper suggests an algorithm for extracting the location of a building from satellite imagery and using that information to modify the roof content. The materials are determined by measuring the conditions where the building is located and detecting the position of a building in broad satellite images. Depending on the incomplete roof or material, there is a greater possibility of great damage caused by disaster situations or external shocks. To address these problems, we propose an algorithm to detect roofs and classify materials in satellite images. Satellite imaging locates areas where buildings are likely to exist based on roads. Using images of the detected buildings, we classify the material of the roof using a proposed convolutional neural network (CNN) model algorithm consisting of 43 layers. In this paper, we propose a CNN structure to detect areas with buildings in large images and classify roof materials in the detected areas.


Introduction
Artificial intelligence algorithms demonstrate outstanding results in predicting or discriminating complex or nonlinear models. Among nonlinear models, disaster-damaged models are difficult to predict because they are affected by a variety of variables, such as weather and ground. Currently, images obtained from satellites are available as well as sufficient computing power to process these images. Satellite images come in three main types depending on the satellite's shape [1]. The first is an image containing color information that collects light from the reflected sun, the second is an image that measures the heat on the surface using infrared rays, and the third is an image that uses water vapor in the upper atmosphere. In previous studies, solely RGB photos were used to classify three types of vegetation land cover (tree, shrub, and grass). DeepLabV3+, a deep learning method for semantic segmentation, was applied [2,3]. In our study, the satellite images containing color information were utilized to classify roofs based on color information. The images of satellites are very large and contain very large amounts of imaging information. To interpret these images, research on image spatial decomposition analysis is increasing. Many images are overlapped to convert satellite images into 3D [4][5][6]. Since digital surface model (DSM) images [7][8][9][10], which contain so much information, are generally lower in resolution than LiDAR data [11][12][13], research on 3D pointer cloud Easy Matching technology continues to improve with a focus on accuracy [14][15][16][17]. Image processing algorithms using satellite images use algorithms such as grid-based [18][19][20], spectral-based [21][22][23][24], and resolution-based [25,26] to classify zones and distinguish required data using differences from surrounding data [27].
Natural disasters, such as earthquakes, hurricanes, and floods, can have fatal effects on people and communities. It is good to predict and defend these hazards, but these prediction models are expensive and time-consuming because they vary from region to region and include massive quantities of data. In order to prepare for such trouble, buildings may be protected, especially incomplete or poorly constructed buildings that are exposed to greater damage. Roofs were used as surface data to approximately analyze the safety of the building. A roof is a part of the final stage of construction, so it can be judged that a completed building has been completed because the roof is also completed. There are cluster-based [28][29][30] and object-based [31,32] classification methods to determine roofs on maps. Satellite image processing methods currently being studied are used to supplement the needs of existing data through the fusion of satellite images and required sensor data and are often used to acquire a wide range of approximate information, such as overall distribution and environmental situations, rather than fine-grained information obtained from images. When sensor data in the target location are difficult to collect or additional data are limited, these methods can be problematic. Roof detection is also studying recognition in narrow spaces by utilizing three-dimensional data from areas where buildings are mainly distributed or by recognizing specific objects or patterns that roofs contain. In this paper, we proposed an algorithm to classify satellite images in twodimensional form and the environment in which the building is mainly distributed and the surrounding information, to classify the location of the building without additional data, and to classify the material of the roof by dividing each building. Building information can be obtained and secured using satellite image information through segmentation and classification. According to these studies using satellite imagery, since architectures are typically elevated objects in images, the height details given by DSM data have the following characteristics:

1.
Color: The colors of the roads and roofs of each building are generally distinct, so use the colors to differentiate them.

2.
Road: Use occlusion to classify spaces to ensure that buildings and roads are properly classified.

3.
Stereo: Using the image's height information, identify the places where the height information varies quickly.

4.
Noise: Noise in 3D image data is divided into sections where the structure is imperfect or distinctly defined, such as corners, fault lines, and valleys.

5.
Pattern: A completed roof will have a specific pattern. It is best to classify using meaningful information such as color, material, and shape, but it is difficult to find these patterns and forms. The shape and color of the roof of the building vary, and even if it is the roof of the same building, the shape and material can be different, causing difficulties in classification.
The framework of each building is extracted using environmental data around the building, and the roof of the building is verified for the material and completion of the building so that it may be identified in advance in the emergency of a disaster. To determine the roof, we utilize convolutional neural network (CNN) algorithms that detect areas where the roof is likely to be present in images of large satellites, use roof features to detect the roof, and propose the material of the detected roof. The proposed CNN model consists of 43 layers. Learning has been classified into four popular types of roof materials, and since it often exhibits patterns similar to those around it, we designed the model intending to classify features well. The learning results of the proposed model showed an approximately 9% accuracy improvement in material classification compared to GoogleNet. We propose an algorithm to detect roofs from satellite images and learn them to classify the materials of roofs.

Roof Detection Using Image Processing
In order to detect the roof in the video, the location of the roof shall be detected. Satellite images are very large in size and high in terms of resolution, so they are processed separately by region.
To detect the roof of a building, first separate the space by road. Because the building is divided by roads, it is detected in the video. The areas in which space is divided by a continuous or unique reference line in the image is determined to be roads and paths because road and roads are built in a straight line in the image. Figure 1 shows the areas separated by the road. In general, buildings are likely to exist in spaces separated by the characteristics of straight roads. These spaces are explored because buildings are likely to exist in areas that are divided by roads. First, look for parts that are of different colors. Land and roofs are usually composed of different colors, so we set up areas where buildings are likely to exist based on colors in the separated areas. Figure 2 shows the separation of space again based on color in places where there are likely to be buildings separated by a lane. When blue is referred to as land, the divided colored space on the ground is considered to be a house. However, since it is difficult to tell which section is the right building, it separates all places where the color is distinct. Next, color-coded areas should be detected where the actual building is located. At this time, the satellite image should be composed of a three-dimensional form or the depth map should be extracted from the image using stereo algorithms. In the case of structures, the ground determines the lower area of the color-coded area, with all spaces higher than the ground being judged as buildings. However, if three-dimensional data cannot be obtained, we determine it by using noise or patterns in the region. Figure 3 shows an image containing a building used to determine an area in an environment without three-dimensional data. We isolate the noise region using the image in Figure 3. Differentiated locations such as roads and buildings are divided by using Fourier transform because pixel differences occur and regions are divided. To apply Fourier transform to the image [33], we separate the RGB channels and apply them for each channel.
We used Equation (1) to add discrete Fourier transform to images, divided by pixels for Fourier transform. The frequencies are: u in the x-axis direction and v in the y-axis direction, W in the horizontal size of the image, and H in the vertical size of the image.
The results of applying Equation (1) to Figure 3 are shown in Figure 4. We show the roof area by showing noise and patterns. The figure shows that the roof of the building is noisier than the ground, and a particular pattern of the roof appears. The location of these noise patterns is determined by the separation of roofs and houses. Using noise, we detect regions by setting thresholds in the delimited data (R1.2) ( Figure 4a) where regions consisting of below thresholds are classified as empty or continuous spaces, such as roads, and pixels with changes in values above thresholds are all boundaries. Figure 4b uses the gradient of the image to find the changing interval. Since it is a two-dimensional figure with x and y data, the x and y axes are added respectively. The changing points can be seen as boundary areas between roads or buildings. Using these data, matching each pixel allows us to check the boundaries of changing pixel values and detect buildings based on the areas where the boundaries appear prominently.
Using the separated data in Figure 4, the edge points of the building are detected using the edge part of the area where the building exists in the original image. To detect corner points in classified data using thresholds, we compared data in eight directions of pixels in that part and used the part with changes in values in three or more proximity pixels as corner points. Figure 5 shows the corner points of the buildings extracted from the image as red dots and divides the area of the roof based on these corner points. We use these divided roof areas as the input image of the learning.  Using the detected corner points, we connected the corner points of the same building and separated them for each building. Because roofs of the same building often consist of the same color, we explored the x and y axes based on the color of a point to create a boundary of the building by connecting the outermost corner points.

Classification of the Material of the Roof
The material of the roof is used by the CNN algorithm in the area where the roof is present as detected in Section 2. There are four types of roofing materials: concrete cement, healthy metal, incomplete, and irregular metal. The reason why roofs are divided into four types is that different countries have different architectural styles for different reasons, such as climate and religion, and the materials and shapes of roofs vary depending on the architectural style. We use CNN to learn using a database of roofs that have been designed with each material. Figure 6 shows an example of an image for each material.    In the input image, max pooling is used in the initial 3 × 3 convolutional layer to highlight the feature values of the initial data. Max pooling is used to differentiate between patterns and materials, and characteristics are best reflected while max pooling only leaves broad values. After that, filters in three sizes were used to isolate different characteristics, and data were merged using concatenation. The calculated value is divided into two parts. The left-hand side of the structure uses the convolutional layer and the pooling layer to make the features more prominent, while the right-hand side has features close to the initial value to avoid losing the initial features. Table 1 presents the parameters of the layers used in Figure 8. Weights initializer uses the "He method" and filters, pools, strides, and padding values are set as shown in the table.
To prevent the size of the data from decreasing, padding was output equal to the size of the image being entered, with a value of 0. For the left region of Figure 9, we extract the feature points with the structure shown in Figure 10. All of these calculated features are combined into concatenation to classify roofs using a fully connected layer. Table 2 sets the number of filters and pool size filters as the parameters of the layer in Figure 9 and similarly determines the size of the padding so that the image does not become smaller, and the value is entered as 0.    Figure 10 creates a structure to highlight various feature points by positioning a convolutional layer at the end of the network to extract feature points from the images.
The later layers changed the filter size to find the features highlighted earlier once again. Table 3 shows the parameters of the layer used in Figure 10. Table 3. Layer Parameters in Figure 10. The fully connected layer is used to learn the features from the previously computed image. The number of hidden layers and the number of nodes is a highly experienced part of the user's design phase. We found several cases of avoiding overfitting and increasing learning rates through iterative learning of the number of layers and nodes that are initially arbitrarily set, then used the model with the highest accuracy among them. It consists of a total of six Hidden Layers, each of which is designed as Input: 592, Hidden Layer 1:900, Hidden Layer 2:1200, Hidden Layer 3:600, Hidden Layer 4:200, Hidden Layer 5:50, Hidden Layer 6:10, Output: 4. We show the structure of the proposed neural network in Figure 11.

Experiment Environment
The experimental environment used two CPU Intel i7-8550, 32GB RAM, and 2080ti GPU RTX, followed by Window10, Matlab 2020a, and the library for learning was Caffe2. In the database, a total of 11,620 images were used for training images labeled with four materials, including concrete cement: 4739, health complete: 2643, irregular metal: 2500 images, of which 22 images were used randomly. Learning has verified the structure of the proposed algorithm and the performance of the algorithm compared to GoogleNet.

Detection of Roof Areas
Roads should be detected in satellite images for roof detection. Figure 12 presents the original image used for detection. To detect areas with roofs in satellite images, lines consisting of straight lines and curves are detected. Figure 13a shows the lines detected in the image while Figure 13b depicts the roofed area using the lines observed. If the area of the building is detected, the building must be located in the detected area. If you have three-dimensional data at this time, you can detect it using height. If there are no 3D data, it can be detected by using noise and roof patterns. Figure 10 illustrates the region where structures are most likely to be found. The large-valued part of Figure 14 is roof-patterned, roof-colored, and is expected to have buildings in the area because of different noise from the surrounding environment.     Figure 12. We store both the location and the area where the building is located in the image and use this image data to classify the material of the roof with the CNN algorithm.

Roof Material Classification
The CNN algorithm is applied to the representation of the building observed in Section 4.2. Figure 17 exposes an image of a single building from a satellite image. The green-roofed building extracted from the red circle section is applied to the convolutional layer to find the feature point. Figure 17. The roof of a building separated from the image. Figure 18 demonstrates a featured image of the roof of the extracted building detected in the convolutional layer. The material of the building is classified using these detected featured images. Features vary depending on the building's material, pattern, and noise, allowing the roof to be classified. These detected roof data are used as learning data or as data for detection.

Learning for Roof Detection
The database used for learning is used as in Section 4.1. We learn the proposed CNN structure and GoogleNet for roof detection using the same database and compare the results.
For learning, the fully connected parameter was set as shown in Table 4, and the proposed CNN model and GoogleNet used the same fully connected. The proposed parameters include a total of six hidden layers.  Figure 19 shows the proposed CNN model and training, testing accuracy using GoogleNet. Batch Size is set to 8 and Epochs to 70. Since the validation data are randomly selected from the training data, we iterate in this way a total of 20 times to verify that the training data are well trained even when changed. Figure 19 shows higher accuracy of the proposed CNN model than GoogleNet in the proposed parameters, and Table 5 shows the accuracy results of 20 iterations of learning. Accuracy varies by as little as 5% and as much as 7%.   Table 6 shows the accuracy of CNN and GoogleNet, proposed by learning material. Material-specific accuracy shows that CNNs, which propose training, validation, and testing, show high accuracy of 3-9%, making it suitable for the CNN model proposed by the corresponding structure and parameters.    Table 6 shows the accuracy of CNN and GoogleNet, proposed by learning material. Material-specific accuracy shows that CNNs, which propose training, validation, and testing, show high accuracy of 3-9%, making it suitable for the CNN model proposed by the corresponding structure and parameters. We select precision, recall, and F1 to quantitatively evaluate the performance of the model. Table 7 shows the resulting values for each performance metric. Precision indicators showed good performance over 95%, while recall also showed 87%. F1-score also shows over 91% performance, showing that it is a fully practical model.  Figure 20 shows test accuracy by batch size. The smaller the batch size, the higher the accuracy of the learning, but the longer the learning takes, the more appropriate batch size needs to be set. In the proposed CNN structure, the batch size is set to 8. Batch size is set to 8 because there is no significant difference from the result of having a batch size smaller than 8 on the graph, but the error becomes larger than the existing one.  Table 8 shows the accuracy of the number of epochs. With fewer epochs set, termination is likely before learning is completed, leading to underfitting, and with too many epochs, there is likely to be overfitting. Therefore, proper epoch settings are needed  Table 8 shows the accuracy of the number of epochs. With fewer epochs set, termination is likely before learning is completed, leading to underfitting, and with too many epochs, there is likely to be overfitting. Therefore, proper epoch settings are needed. Table 8 shows the learning accuracy and error repeated 20 times for each epoch to determine how accurate the learning is and how it is learned by each epoch. The error range of accuracy is reduced to 70 with certainty, and then the error range converges with accuracy as soon as 70 is exceeded, indicating that the learning is complete if epochs is 70 or higher. In this paper, we show that setting epochs to 70 is appropriate because epochs become overfitting even if they are too large.

Conclusions
The roof of a building can be used as a measure to determine whether the building is complete or incomplete. If there is a problem with the building, major casualties such as disasters or collapse of the building can occur, so analysis of the material used for the roof of the building can be a measure of the building situated in the area.
In order to obtain images of the roofs of buildings in large areas, satellite images must be used to extract and use the necessary information because they are very large in terms of resolution and size and contain a wide range of information. Information on the roof should be extracted from satellite images. Buildings can be easily distinguished if the original image contains three-dimensional information or if the height can be known using depth map, but in the absence of such information, the building must be detected first.
The conditions of the area in which the building is located were determined. Since the color information is distinct or high from the natural atmosphere due to the roof, it is separated into roads, etc., and has different noise elements from the surrounding area. These conditions are used to detect areas with buildings from satellite images and to extract images of roofs.
We classify the material of the roof using the results learned with the proposed CNN structure using the image of the roof. The roof material was divided into two categories: widely used materials and incomplete buildings. Comparing the results learned with the proposed CNN model with the existing structure of GoogleNet, training, validation, and testing accuracy showed a 5-7% performance improvement. When we look at the proposed structure precision, recall, and F1-score, better performance metrics than conventional models are observed, and the learned material-specific accuracy also has a 9% accuracy improvement. Via the material of the roof, it is possible to investigate and collect intelligence in a wide range of areas, such as emergency preparedness or construction complement, using such confidential information from satellite photographs. In the future, we aim to investigate algorithms that can be derived from satellite images, such as surrounding environment road signs and building details, and to evaluate conditions more accurately using algorithms that learn them.

Conflicts of Interest:
The authors declare no conflict of interest.