Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models

Farajelahi, Behaeen; Arefi, Hossein

doi:10.3390/rs17142496

Open AccessArticle

Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models

by

Behaeen Farajelahi

¹ and

Hossein Arefi

^2,*

¹

School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran 1439957131, Iran

²

i3mainz, Institute for Spatial Information and Surveying Technology, School of Technology, Mainz University of Applied Sciences, D-55118 Mainz, Germany

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2496; https://doi.org/10.3390/rs17142496

Submission received: 20 May 2025 / Revised: 11 July 2025 / Accepted: 14 July 2025 / Published: 17 July 2025

Download

Browse Figures

Versions Notes

Abstract

This research presents a novel method for detecting building roof types using deep learning models based on height profiles from high-resolution digital surface models. While deep learning has proven effective in digit, handwritten, and time series classification, this study focuses on the emerging and crucial area of height profile detection for building roof type classification. We propose an innovative approach to automatically generate, classify, and detect building roof types using height profiles derived from normalized digital surface models. We present three distinct methods to detect seven roof types from two height profiles of the building cross-section. The first two methods detect the building roof type from two-dimensional (2D) height profiles: two binary images and a two-band spectral image. The third method, vector-based, detects the building roof type from two one-dimensional (1D) height profiles represented as two 1D vectors. We trained various one- and two-dimensional convolutional neural networks on these 1D and 2D height profiles. The DenseNet201 network could directly detect the roof type of a building from two height profiles stored as a two-band spectral image with an average accuracy of 97%, even in the presence of consecutive chimneys, dormers, and noise. The strengths of this approach include the generation of a large, detailed, and storage-efficient labeled height profile dataset, the development of a robust classification method using both 1D and 2D height profiles, and an automated workflow that enhances building roof type detection.

Keywords:

building roof type detection; classification; digital surface model; deep learning; convolutional neural network; height profile

1. Introduction

Buildings are essential elements in various applications, including smart city development, urban planning, crisis management, and mapping. Building detection is fundamentally a classification task, where different features are identified, distinguished from one another, and grouped into specific categories. In the past two decades, many studies have been conducted to detect building roof types from aerial imagery, point cloud, or a digital elevation model using both conventional and deep learning-based methods [1]. The conventional detection methods are divided into two categories: segmentation and classification methods. However, many studies deal with object detection based on a combination of both methods. Among pixel-based methods, techniques such as clustering [2], thresholding [3], and the combination of clustering and thresholding [4] are standard. Edge-based methods include active contour and k-means clustering [5]. Edge-based object detection relies on sudden discontinuities in intensity changes since the objects at boundaries tend to change sharply. Various operators are used to generate edge images. Gradient-based first-order derivatives (Sobel operator, Prewitt operator, Robert operator) and Laplacian-based second-order derivatives are standard methods for edge detection [6]. Therefore, the edges in the input data are used to detect and segment objects.

Area-based approaches include region-growing and split–merge methods. The region-growing method is also known as the similarity-based segmentation method. It identifies objects from spectral features such as texture, color, and intensity in image data and geometric features such as normal vector, curvature, and coplanarity in point cloud data. This method requires initial points to start the algorithm, and points with similar features are grouped in the same region [7,8,9]. Many studies have also been conducted on combining clustering and region-growing object detection [7,10]. Another widely used method for object detection is classification. This method is generally divided into two methods: supervised classification and unsupervised classification. In supervised classification, training data are created with user supervision, and in unsupervised classification, the desired classes are designed using clustering methods [11]. Despite achieving high accuracy and performance, they suffered from weaknesses. These included dependence on data density, spatial resolution of input data, precise initial parameter settings, suitability only in areas with simple topography, and a lack of automation.

Due to these limitations, these methods have been replaced by deep learning methods that can automatically identify features and detect objects. Different architectures of deep learning networks have been developed depending on the specific applications and input data type, and various networks such as YOLOv1 [12], Faster R-CNN [13], and EfficientDet [14] have been developed to perform this task with high accuracy and efficiency. In recent years, advanced methods such as mutual-assistance learning [15] and multi-task learning [16] have been introduced to enhance the stability and improve the performance of object detection systems, yielding better results by leveraging joint training and the representation of shared features. These advances have provided a strong foundation for more specialized applications such as building detection. Building detection presents challenges, including trees surrounding buildings, adjacent buildings, buildings with long lengths and short widths, the absence of sharp edges, and noise in the input data [17]. Therefore, any feature or noise can affect the results of building roof type detection. However, many methods require initial assumptions to be made, which reduces their comprehensiveness and automatic applicability in different areas.

In contrast to most deep learning-based building detection methods, which are generally based on pixel classification, semantic segmentation, or object detection using various data types, such as point cloud, aerial imagery, digital elevation models, and other high-spatial resolution remote sensing data, the proposed method follows a distinctly different methodology. We introduce a novel framework for generating one-dimensional (1D) and two-dimensional (2D) height profiles from normalized digital surface models (nDSMs). Unlike conventional methods that require processing very large-volume datasets, our height profile-based method significantly reduces data complexity, dimensionality, memory requirements, and computational costs while fully preserving the essential geometric information required for accurate roof type detection. In this method, instead of analyzing large and complex data, the focus is directly on the geometric features of height profiles of the building’s cross-sections. Three types of height profile-based inputs—namely 1D vectors, binary images, and two-band spectral images—were designed, and dedicated convolutional neural networks (CNNs) were developed for each input type. The proposed method demonstrates robustness against common challenges such as adjacent buildings, noise, and structural complications on building roofs. To the best of our knowledge, this is the first study to systematically leverage 1D and 2D height profiles extracted from nDSMs for deep learning-based building roof type detection. In this study, we classified and identified seven roof types—flat, shed, gable, pinnacle, hip, mansard, and combined—using height profiles derived from high-resolution digital surface model data (Figure 1).

The primary objective of this research is to develop an efficient and automated method for classifying roof types based on height profiles and deep learning techniques. To achieve this goal, the main contributions of this research are as follows:

A hierarchical preprocessing framework is proposed to improve the extraction of single buildings from complex urban environments.
A new annotated dataset containing seven distinct roof types was developed to support more comprehensive research on roof type classification.
Three novel classification methods leveraging height profile features were developed and evaluated to enhance roof type detection performance.

To achieve the aforementioned objective, the following research questions are formulated and will be addressed and discussed throughout this study.

How does the quality and accuracy of the input data affect the results?
What are the key network training parameters determined for each method?
What are the advantages and limitations of 1DCNNs and 2DCNNs over each other?

The remainder of this paper is organized as follows: Section 2 introduces an overview of related approaches for digit detection, shape detection, and time series classification. Section 3 describes the detailed building roof type detection steps. Section 4 presents experimental results and discussions. Section 5 summarizes the conclusion.

2. Related Work

Deep learning methods have gained popularity due to their effectiveness in detecting and classifying geometric shapes, handwritten shapes, and signals. The detection method uses 2D data with 2DCNNs or 1D data with 1DCNNs. This research reviews studies using 1DCNNs for detection, relevant for classifying 1D height profiles, and using 2DCNNs, relevant for 2D height profiles in binary images and two-spectral band images.

In recent years, much research has been conducted in data mining on signal and time series classification (TSC) using deep learning networks as a complex problem [18,19,20]. Since 2015, there has been a surge in the availability of temporal data, leading to the emergence of numerous TSC algorithms [20]. Krastev et al. [21] proposed a method for classifying GW signals into three classes: BBH, BNS, and noise using a 1DCNN. Huang and collaborators [22] developed a monitoring system for vibration signals. The raw signals are first generated, preprocessed, and divided into two series of signals at different time intervals (time domain—frequency domain). The classification accuracy is 99.8%. The results indicate that the 1DCNN has proper performance for classifying vibration signals. The network used can quickly learn different features; however, the architecture of the 1DCNN used does not have high flexibility against the number of classes. Lee and colleagues [23] used a CNN and an LSTM to extract features and classify power signals into normal and abnormal groups. The accuracy of this network for signal classification is 90%. In this research, we employ a 1DCNN to extract features and classify input 1D height profiles.

Wang et al. [24] proposed a method for identifying and tracking trains on railway tracks using structured light laser profiles and the Sketch-A-Net network for classification. The network extracts intrinsic profile features and matches profiles temporally and spatially for tracking. The approach addresses the lack of rich texture information and achieves a detection accuracy of 98.2%, outperforming other deep learning methods. Harp and colleagues [25] studied the classification of 1D synthetic time series signals using machine learning and deep learning algorithms. The input signal is transformed into 2D spectral images. In fact, by transforming the linear time series signal into a 2D image, a basis is created for using deep learning algorithms such as 2DCNNs. Deep learning networks, DPN, DenseNet, WRN, and ResNet, are used. The results show that all the methods used have high potential and are widely used in signal detection and analysis; the WRN network achieved the highest accuracy of 95.77%.

In character recognition (CR) from images, machine learning and deep learning algorithms have been used to solve challenging problems. CR has fundamental steps such as segmentation, feature extraction, and classification. In the domain of CR, deep learning algorithms have higher performance. They are a fast-advancing field among other machine learning models due to their feature extraction and best classifier characteristics. The CNN is an excellent tool for image detection because of the use of hidden layers. Several schemes have been proposed for the CR. However, it still faces challenging problems [26,27]. 2DCNNs have been less used to classify shapes from binary images. These images do not have brightness, color, or texture. The main problem in classifying these images is the presence of variations, edge connectivity, and hidden regions. In general, these images have two shape descriptors: area-based methods that use information from pixels in the image, and contour-based methods that use the shape of the contour [28].

Ghosh et al. [29] used a 2DCNN with a combination of cost functions and optimizers to classify a set of grayscale images of Japanese handwriting with dimensions of 28 × 28. They classified them with an accuracy of 96.13%. In this paper, the authors trained a proposed 2DCNN to classify grayscale images of handwritten digits with dimensions of 28 × 28. The proposed network achieved a classification accuracy of 99.21% and significantly reduced the computational time for training and evaluation [30]. Kalfas and collaborators [31] used AlexNet, VGG16, and VGG19 to classify regular and irregular 2D shapes in binary images. The dataset consisted of regular shapes that varied in non-random properties, or irregular and asymmetric shapes with curved or straight boundaries. The results indicate that the 2DCNNs AlexNet, VGG16, and VGG19, which were pre-trained for classifying natural images, performed better than networks trained with random weights from the beginning. The difference between the trained and untrained CNNs appeared in the deep convolutional layers, where the similarity between the shape-related response modulations of neurons and the trained CNNs was high.

While these studies demonstrate the application of deep learning networks for classification purposes, there is a gap in their effectiveness for detecting building roof types from height profiles; therefore, in this research, 1D and 2D convolutional neural networks are used to classify height profiles and detect building roof types.

3. Methods

In this study, we employ 2D and 1D convolutional neural networks (CNNs) to present an extensive framework for detecting building roof types based on height profiles. The primary procedures include data preparation, single-building extraction, profile generation, CNN training, and building roof type detection (Figure 2). First, nDSMs are generated by processing the LiDAR point cloud. Next, the nDSMs are used in the single-building extraction process to create sub-nDSMs. These sub-nDSMs are then used to generate height profiles in three formats: two binary images, a two-band spectral image, and two 1D vectors. The height profiles are manually labeled and used to train various neural networks for the classification task. Finally, the trained networks are applied to the test dataset to classify building roof types. In the proposed framework, vectors, two-band, and binary images are the inputs for different 1D- and 2D-trained networks. The following subsections summarize each step and its main components (Figure 2).

3.1. Data Preprocessing

Over the past two decades, various filtering methods have been extensively studied to produce a regularized 2D grid. This involved filtering building and non-building points. In this study, the points are classified using the Cloth Simulation Filtering (CSF) algorithm, which requires manual supervision [32]. A regularized grid, including ground points, tree points, and other features using non-building points, is created. Both rasters are generated based on classified points and linear interpolation, which is necessary to convert scattered building and non-building points into grid-based rasters [32]. The nDSM with high spatial resolution, representing the difference between the digital surface model (DSM) and the digital terrain model (DTM), is created using the LiDAR point cloud, and this makes the height of buildings absolute in the stage of producing height profiles to create suitable samples. In this study, the nDSM consists only of building points, and each roof is assumed to have a local maximum height. The accuracy of the nDSM generation has a direct effect on generating the height profiles. Local maximum points are identified using the geodesic dilation method [33], considering a specific height element. Following this, a median spatial filter is applied to produce a smoother output image and eliminate background noise from the image of local maximum points. A mask image with a specific threshold limit is then generated. Using the thinning algorithm [34], the central ridges in the building roofs are derived from the extracted local maximum points. For buildings with flat roofs that lack a central ridge line, the algorithm considers the middle axis of such roofs as the ridge line. Figure 3 shows the flowchart of the data preprocessing steps. As a post-processing step, the roof’s centroid is computed. Subsequently, the bounding box of each sub-nDSM is extracted based on the center and length of the largest direction of the ridge line on the building roof (Figure 3f). Since each sub-nDSM (Figure 3g) may contain portions or the entirety of adjacent buildings surrounding the central building, a post-processing step called single building extraction is applied to remove non-central buildings from the sub-nDSM. The nDSM of the Vaihingen dataset was generated with a spatial resolution of 25 cm.

3.2. Single Building Exatraction

In this step, the building located in the center of the sub-nDSM is selected as the target. Additional steps are then taken to eliminate any remaining parts of adjacent non-central buildings. During the single building extraction process, a matrix of the exact dimensions as the sub-nDSM is initialized with zeros, referred to as the null image (Figure 4b). The corresponding mask image for the sub-nDSM is generated following (Figure 4c). Using region properties analysis, the center, orientation, and border points of buildings in the mask image are identified (Figure 4d). The buildings in the mask image are labeled based on the identified border points (Figure 4e). The building located in the center of the mask image is selected as a target (Figure 4f), and its corresponding pixels in the null image are set to one, generating the mask image of the central building (Figure 4g). Finally, this mask is multiplied by the original sub-nDSM (Figure 4a) to create the final sub-nDSM that contains only the target building (Figure 4h).

3.3. Height Profile Generation

A projection-based method determines the optimal rotation angle to align roof faces with the Cartesian axes. Point cloud data from a buffer around the building center along the image axis are projected onto the XOZ and YOZ planes. forming 2D grids. The sub-nDSM is then rotated iteratively by 1°, and at each step, the number of empty grid cells is counted. The angle that yields the highest number of empty cells is selected as the rotation angle (Figure 5).

In the binary image-based method, 2D height profiles are transformed into two binary image formats for each sub-nDSM, enabling the application of deep learning algorithms such as the 2DCNN. Each binary image represents the geometric type of the 2D height profile of the building’s cross-section. In the two-spectral band image-based method, 2D height profiles are transformed into a single image format for each sub-nDSM. Since RGB images require at least three bands for proper storage, the third band is a matrix of zeros. Consequently, an image is created containing two bands of spectral information. The first and second bands represent the 2D height profile geometry of the first and second cross-sections of the building, respectively. The vector-based method stores 1D height profiles as two 1D vectors for each sub-nDSM. The longitudinal and transverse steps of the height profiles are not stored as they are fixed; therefore, only the height values are kept (Figure 6).

It is crucial to choose appropriate dimensions for saving 1D and 2D height profiles in image and vector formats to preserve the geometry of the height profiles and the information on cross-sections. Two-dimensional height profiles are saved with dimensions of 128 × 128 pixels in the binary image-based method and 128 × 128 × 3 pixels in the two-spectral band image-based method using the save.png method. A zero-padding technique [35] is used in the vector-based method by adding a zero value to both sides of 1D vectors to equalize the length of each vector, so two 1D vectors with a length of 570 are saved for each sub-nDSM.

In the context of photogrammetric analysis, various roof types exhibit distinct height profile geometries. In this research, there are some rules for saving height profiles. A building with a flat roof has two height profiles with quadrilateral geometry. A building with a shed roof has two height profiles with a quadrilateral geometry for the first cross-section and a convex quadrilateral geometry for the second cross-section. A building with a gable roof has two height profiles with a quadrilateral geometry for the first cross-section and a pentagonal geometry for the second cross-section. A building with a pinnacle roof has two height profiles with pentagonal geometry for both cross-sections. A building with a hip roof has two height profiles with pentagonal geometry for the first cross-section and a trapezoidal geometry for the second cross-section. A building with a mansard roof has two height profiles with trapezoidal geometry for both cross-sections. A building with a combined roof also has two height profiles with complex geometries for both cross-sections (Figure 7).

The operator labels the height profiles such that two height profiles of the building cross-section, the corresponding point cloud, and its orthoimage are displayed simultaneously. The geometry of each height profile is then determined. In challenging situations, labeling is performed based on the building’s ground truth. Once the geometries are specified, the building’s roof type is labeled accordingly (Figure 8, Figure 9 and Figure 10).

After counting the height profiles in each class, due to the lack of a sufficient number of buildings of each type of roof, there is a problem of unbalanced distribution in the classes, which is solved by data augmentation methods to avoid the problem of overfitting. Data augmentation is a method of generating additional samples from a dataset. It is the process of perturbing data by transformations that change its appearance while leaving the high-level information intact [8]. The data augmentation methods for the training data of height profiles, which are stored in two different forms of image and vector, differ from other standard methods used in CNNs. Therefore, choosing an efficient method for adding data according to the type of database is particularly important in this research.

Given the difficulty in precisely extracting the horizontal axis corresponding to the ridge line and the inevitable measurement errors associated with the center and main direction of a single building ridge line, these errors can be incorporated into the data augmentation process in network training. A maximum measurement error of four pixels for the center and two degrees for the main direction is considered during the single building extraction. To address this, a comprehensive database has been constructed based on the types of buildings under investigation. This method involves adjusting the center of single buildings by the measurement error and their orientation by the main direction error. It is also possible to increase the samples of the minimum class by using the approximation method to the central axis of the height profiles and adding them to the original dataset. How to generate more height profiles with exaggeration is shown in Figure 11 and Figure 12.

3.4. Roof Type Detection

This study addresses the classification optimization problem, where the network input comprises images or vector data, and the output corresponds to the class or label associated with the input data. To tackle the classification problem, various CNNs are examined and compared, including 2DCNNs (binary image-based and two-spectral band image-based methods) and 1DCNN (vector-based method). Each method’s classification of roof types encompasses seven categories: flat, shed, gable, pyramid, hip, mansard, and combined roofs (Figure 13).

Binary Image-Based: In the classification problem addressed by the binary image-based method, 2DCNNs are trained with dual objectives. Initially, the geometry of each 2D height profile, in the format of two binary images, is estimated individually from the building’s cross-section. The classification encompasses five geometric classes: quadrilateral, convex quadrilateral, pentagonal, trapezoidal, and complex geometry. Subsequently, a decision-making algorithm classifies the building’s roof type based on the two estimated geometries (Figure 14).
Two-Spectral Band Image-Based: In this method, trained 2DCNNs directly classify the building roof type from 2D height profiles, represented as a two-band image, and assign a specific label corresponding to the building roof class for the input data (Figure 15).
Vector-Based: In this method, the trained 1DCNN directly classifies the building roof type from 1D height profiles, represented as two 1D vectors (Figure 16).

4. Experimental Results and Discussion

4.1. Description of the Datasets

In this study, we used two benchmark datasets of the city of Vaihingen and a region of Potsdam in Germany provided by ISPRS [36,37]. The Vaihingen dataset spans approximately 242.15 hectares of the German city of Vaihingen, while the Potsdam dataset covers an area of 19,565 square meters in Potsdam, Germany. These datasets contain various types of roofs, with a significant portion of buildings having complex roofs. Initially, a DSM and a DTM were created from the LiDAR point clouds of Vaihingen, Germany, with a spatial resolution of 25 cm, and for Potsdam, Germany, with a spatial resolution of 5 cm. The Vaihingen dataset accuracy is about 10 cm in both planimetry and height [38]. From the difference between the DSM and the DTM, an nDSM was produced with a spatial resolution of 25 cm for the city of Vaihingen and 5 cm for an area of the city of Potsdam (Figure 17). We implemented all processing and calculations for the training part of the CNNs, and the detection of building roof types in each proposed method on a 64-bit operating system, 24 GB of memory, an Intel^® Core i7-6800K CPU @ 3.40 GHz (Intel Corporation, Santa Clara, CA, USA), and with an NVIDIA GeForce GTX 1080 Ti GPU (NVIDIA Corporation, Santa Clara, CA, USA). The experiments were conducted using Python 3.10 and TensorFlow 2.9, providing stable and efficient model development and a deployment environment. Most of the data was used to generate the training dataset, which contained 75% of the height profiles from Vaihingen. The training dataset was divided into training data (75%) and validation data (25%). The training data were used to train the CNNs. In contrast, the validation data, a part of the training dataset not used in the training process, was used to calculate the accuracy and precision of the neural network training.

To solve the problem of unbalanced sample distribution across classes, data augmentation methods (explained in Section 3) were applied, generating additional training data that were added to the initial dataset. Two test datasets were used to evaluate the building roof type detection process. The first test dataset contained 20% of each roof type, from which height profiles were generated, and buildings in Vaihingen were labeled. The second test dataset consisted of a Potsdam region with architecture similar to Vaihingen’s. This dataset was used to examine the networks’ performance across different spatial regions, assess their robustness to noise and other roof complications, and evaluate their generalization capabilities on data from Potsdam, which they had not previously encountered.

4.2. CNN Training

The learning process of the CNNs started with the training data. In this study, due to the uniqueness of the data types used in each method, training was conducted from scratch with random initial values for the weight vectors, and pre-trained networks were not used. Accordingly, comprehensive training datasets—comprising various geometries for the binary image-based method, roof types for the two-spectral band image-based method, and diverse vectors for the vector-based method—were created to train the millions of parameters in each model under study. The processing time for training the CNNs was high, and advanced hardware resources were required to ensure network convergence and achieve the desired accuracy. In this study, feature extraction and classification tasks were performed using seven 2D CNNs—Inception-V3 [39], EfficientNet-B0 [32], EfficientNet-B7 [40], InceptionResNet-V2 [41], ResNet-50 [42], MobileNet [43], and DenseNet-201 [44] for the binary image-based and two-spectral band image-based methods. A 1D CNN was used for the vector-based method.

The 1D CNN used in this research consisted of a main block with a convolutional layer, followed by a batch normalization layer and an activation function layer. The convolutional layer was applied using three 1D kernels of fixed size and no stride. This basic convolution block was repeated three times, each with a filter size of 64. Batch normalization was applied to accelerate convergence and improve generalization. After the convolution blocks, the extracted features were passed to a global average pooling layer instead of a fully connected layer, significantly reducing the number of trainable weights. The final classification was performed using a softmax layer. The number of trainable parameters in the 1D CNN was approximately 26,000 (Figure 18). The optimization functions ADAM, SGD, and RMSprop were used to train the weight and bias vectors corresponding to the training data. The softmax classifier function was employed to classify the geometries and roof types. This function calculates the probability of each sample belonging to each class and returns normalized values between zero and one. The optimizer functions were selected by optimizing hyperparameters using the validation dataset. Detailed results are presented in the Results section.

Due to the unbalanced training data distribution in both the 1D CNN and 2D CNN, specific weights were assigned based on the number of samples in each class [45]. Traditional weighting methods, which use predefined functions to calculate class weights, typically assign lower weights to minority class samples [46]. To improve the model’s sensitivity to underrepresented classes, minority class samples were assigned higher weights in this study (Equation (1)) [47]. This approach ensures that the decision boundary is tuned to classify these data correctly and that the model prioritizes minority class samples during training [48]. As a result, the network was compelled to minimize the classification error for these samples due to the increased cost associated with their higher weight [49].

w_{j} = \frac{\frac{1}{N_{j}}}{\sum_{i = 1}^{j = n} \frac{1}{N_{j}}}

(1)

4.3. Evaluation Metrics

Common indices and standard metrics such as precision, accuracy, completeness, F1-score, and Intersection over Union (IoU) [50] are used to evaluate the accuracy of roof type detection. These metrics are calculated manually based on the number of true positives, false positives, true negatives, and false negatives obtained from each class’s confusion matrix. The confusion matrix shows how the classification algorithm assigns input samples to different classes.

4.4. Investigating the Eeffect of Tuning Hyperparameters

After data augmentation, the training data for Vaihingen included 46,000 samples. Before model development, critical hyperparameters that influence the CNN training process were identified. These parameters included the number of epochs and the batch size (i.e., the number of samples processed per iteration). If the number of epochs is set too high, overfitting can occur, and processing time increases. The learning process remains incomplete if the number of epochs is too low. In this study, 25 epochs were used to train the networks. Batch size is another important parameter, which can range between one and the total number of training samples. A larger batch size requires a more powerful processing system. In this study, considering the system’s processing power, a batch size of 16 was used for the binary image-based and two-spectral band image-based methods, and a batch size of 32 was used for the vector-based method.

According to previous studies [28], the learning rate and optimization functions are among the key hyperparameters that affect training outcomes. This study tested learning rates of 0.01, 0.001, and 0.0001, along with three optimization functions: Adam, RMSprop, and SGD. Because the CNNs were trained from scratch and the models were stored for later evaluation, determining the optimal learning rate and optimizer function was essential. Therefore, this study’s most effective training parameters were selected by minimizing the cost function value on the validation dataset using the Keras Tuner framework. In addition to the aforementioned parameters, three activation functions (ReLU, Tanh, and Sigmoid) and the optimal kernel size for the 1D convolutional layers—critical components of 1D CNNs—were evaluated for the vector-based method. Table 1, Table 2 and Table 3 present the selected hyperparameters and optimization durations for training the networks in the binary image-based, two-spectral band image-based, and vector-based methods, respectively.

After determining the training parameters, each CNN studied using the respective methods was trained using the training data. The performance of the trained networks on the validation data was then evaluated, and accuracy and precision were calculated as percentages. The results of these evaluation metrics on the validation dataset for all methods are presented in Table 4.

4.5. Results of Building Roof Type Detection

The confusion matrix and standard evaluation metrics—accuracy, precision, recall, Intersection over Union (IoU), and F1-score—were used to evaluate roof type detection performance. Based on the optimally trained models, two sets of test datasets were employed to determine the most suitable method for detecting roof types from the height profiles of building cross-sections. Ultimately, the model that demonstrated the most consistent and optimal performance across all scenarios was selected as the basis for roof type detection from building height profiles.

4.5.1. Evaluating Neural Network Performance in Building Roof Type Detection

The Vaihingen test dataset, comprising 37 roofs, was utilized to evaluate the performance of the roof type detection methods trained on buildings from the city of Vaihingen. Table 5, Table 6 and Table 7 present the average results for the binary image-based, two-spectral band image-based, and vector-based methods. As shown in Table 5, for the binary image-based method, the DenseNet201 network achieved the highest Intersection over Union (IoU) score of 83.57% in the geometry type estimation step using height profiles, demonstrating superior performance over other trained networks. In the roof type detection step, based on the decision algorithm, the EfficientNet-B7 network outperformed the others, detecting building roof types from 2D height profiles stored in binary image format with a precision of 87.62%, an accuracy of 96.14%, and an F1-score of 81.9%. According to Table 6, in the two-spectral band image-based method, the InceptionV3 network achieved the highest recall (87.59%), accuracy (96.91%), IoU (79.09%), and F1-score (87.15%). Additionally, the EfficientNet-B7 network recorded the highest precision (92.38%) alongside an accuracy of 96.91%. In comparison, the DenseNet201 network also achieved an accuracy of 96.14%, effectively detecting roof types directly from 2D height profiles in the two-spectral band image format. Referring to Table 7, in the vector-based method, the 1DCNN achieved a precision of 84.52%, a recall of 84.73%, an accuracy of 95.37%, an IoU of 74.01%, and an F1-score of 83.74%. These results demonstrate the network’s effectiveness in identifying building roof types from 1D height profiles represented as two 1D vectors. Based on the results in Table 5, Table 6 and Table 7, all three proposed methods demonstrated robust performance in classifying and detecting building roof types. However, the networks implemented in the two-spectral band image-based method exhibited superior performance in detecting roof types within the Vaihingen test dataset. A second test dataset from a region in the city of Potsdam was employed to evaluate further the generalizability of the trained neural networks in each method and analyze the impact of noise and other complications on roof type detection accuracy.

4.5.2. Neural Network Performance with New Data, Impact of Noise, and Roof Complications

A second test dataset from the Potsdam region was used to investigate the performance of the trained neural networks on unseen data and assess the effects of noise and structural complications on roof type detection. This region includes buildings with structural complexities such as consecutive chimneys and dormers. Given the higher spatial resolution of the nDSM in Potsdam (5 cm), the corresponding height profiles were expected to exhibit more detailed features. Initially, the height profiles of building cross-sections were directly input into the trained networks from each method to evaluate their robustness in detecting roof types under the influence of such complications and noise. An analysis of these profiles revealed that structural complications and noise had significantly distorted the ideal geometry of the height profiles, making them deviate from their original shapes. To isolate the effects of these distortions, clean (complication-free) height profiles were subsequently generated for all three methods to serve as a baseline for comparison. Figure 19 illustrates three examples of height profiles from the Potsdam test dataset, demonstrating the observed effects across all methods.

Since the binary image-based method involves two objectives, the impact of structural complications and noise on both geometry type estimation and roof type detection was analyzed. Binary images of the height profiles from building cross-sections, before and after the removal of complications and noise, were fed into the 2DCNNs to estimate geometry types. Subsequently, a decision algorithm was applied to determine the roof type based on the two estimated geometries. Table 8 presents the average results for the original and cleaned binary images regarding geometry type estimation and roof type detection within the Potsdam test dataset. These results provide insights into the performance variations before and after removing complications and noise.

The results in Table 8 indicate that complications and noise, due to high distortions and changes in the geometric structure, caused interclass similarity in the surface geometry of the sections. Consequently, the trained network frequently misestimated the geometry type. These distortions substantially degraded the performance of the binary image-based method in geometry estimation. The incorrect geometry estimations also negatively affected the decision algorithm, resulting in misclassification of roof types. Overall, the results show that all networks in the binary image-based method were unstable in the presence of such disturbances. Once these complications were removed, the networks improved their performance detecting roof types.

In the two-spectral band image-based method, the impact of complications and noise on roof surfaces was directly assessed in the context of roof type detection. Table 9 presents the average results for the Potsdam test dataset, both before and after removing these disturbances. As shown in Figure 19a.1, a dormer structure on a hip roof causes the height profile to deviate from the ideal geometry. However, the DenseNet201 network was able to detect the building roof type with high accuracy, which indicates the robustness of the model in the face of structural complications and noise on the building’s roof.

Table 10 presents the average results for roof type detection using the vector-based method on the Potsdam test dataset, both before and after removing complications and noise. While the 1DCNN demonstrated effective performance in feature extraction, rapid training, and classification of 1D height profiles in the Vaihingen dataset, the results indicate a decline in performance when applied to the Potsdam dataset. Specifically, the 1DCNN exhibited poor accuracy regardless of whether complications and noise were present. These findings highlight the model’s limited generalizability and flexibility when exposed to data with height profile variations that differ from those in the training set.

4.5.3. Optimal Network Selection for Building Roof Detection Using Height Profiles

An analysis of the results presented in Table 8, Table 9 and Table 10 reveals a general decline in the performance of most networks when applied to the Potsdam test dataset. This reduction in accuracy can be attributed to the higher spatial resolution and increased detail of the nDSMs, which introduced additional complexity into the height profiles. Despite this challenge, the DenseNet201 network maintained robust performance in the two-spectral band image-based method, successfully classifying height profiles and detecting roof types in the Potsdam test dataset.

Its consistent accuracy across the Vaihingen and Potsdam test datasets demonstrates its generalization ability and resilience before and after removing roof complications and noise. Therefore, DenseNet201 can be considered the optimal model for detecting seven roof types—flat, shed, gable, pinnacle, hip, mansard, and combined—in the Vaihingen test dataset, and two types—gable and hip—in the Potsdam test dataset. The visual classification results of DenseNet201 are displayed in Figure 20, where the building cross-section height profiles are categorized by roof type.

According to the values presented in Table 11, the average Intersection over Union (IoU), precision, recall, accuracy, and F1-score for roof type detection from height profiles of building cross-sections in the form of two-spectral band image on the DenseNet201 network are 81.64%, 89.75%, 89.51%, 97%, and 89%, respectively. The results indicate significant accuracy compared to other trained networks in each method. The best-performing model’s inference and post-processing time, including metrics calculation, was 7.50 s for the Vaihingen dataset and 3.29 s for the Potsdam dataset.

4.6. Discussion

In this study, we presented three methods for detecting building roof types by generating height profiles from high-resolution digital surface models. We investigated the performance of different CNNs using 1D and 2D height profiles derived from the Vaihingen and Potsdam datasets to compare the effectiveness of various architectures. A database of height profiles from the cross-sections of buildings was generated and stored in the form of binary-based images, two-spectral band image-based data, and two 1D vectors. Based on the geometry type of the height profiles and ground truth, the samples were labeled and classified into five classes for the binary image-based method: quadrilateral, convex quadrilateral, pentagonal, trapezoidal, and complex geometries. For building roof type detection, they were classified into seven classes: flat, shed, gable, pinnacle, hip, mansard, and combined roofs.

The proposed methods for roof type detection were designed as supervised classification approaches, in which separate samples for each class were fed into the neural network algorithms. Classification was then performed based on these samples’ geometry and spatial signature. Although the training data generation process was automated, labeling the data was time-consuming. Future studies could explore unsupervised methods to reduce this dependence on manually labeled training data. By converting the 1D height profiles to 2D formats, we established a foundation for applying 2DCNN algorithms. However, selecting the appropriate image size is important to preserve spatial information while minimizing training time.

Some geometry types caused ambiguity in classification. For example, quadrilateral and convex roof profiles in the second cross-section could introduce interclass similarity between buildings with flat and shed roofs, especially when variations in geometry were minimal. Similarly, the pentagonal height profiles in the second cross-section, where the vertex covers more than five pixels, may be misclassified as hip roofs due to their similarity to trapezoidal shapes. This issue also applied to the first cross-section of hip and mansard roofs. When the height variations in the central profiles were significant and deviated from ideal geometric shapes, the network tended to classify them as combined roofs due to their complex structure.

Our results demonstrated that deep learning methods can automatically extract high-level features from minimal input data (i.e., height profiles) without auxiliary datasets. All three proposed methods exhibited the ability to classify and detect building roof types. Among them, the two-spectral band image-based method outperformed the others. The vector-based method effectively reduced data dimensionality, enabling roof type detection using 1D data instead of 2D or 3D. However, 1DCNNs with simpler architectures showed limited flexibility when applied to new samples with different height variations. To enhance generalization, it is necessary to include training data from diverse geographic regions.

The binary image-based method ranked third in performance, primarily because the accuracy of geometry type estimation directly influenced the accuracy of roof type detection, which was based on a decision algorithm. Moreover, this method was more susceptible to complications such as roof noise and distortions. Among all networks tested, the 2DCNN DenseNet201 demonstrated strong robustness to roof complications and performed well across different datasets, including new data from a different city. It successfully detected roof types despite distortions in profile geometry.

Several factors influence the final accuracy of the proposed method due to its hierarchical processing structure, where errors in each stage can propagate and affect subsequent steps. One of the primary sources of error is the inaccurate extraction of the local maximum height and the incorrect estimation of the building’s orientation and center, which can lead to the generation of non-central height profiles. Moreover, the presence of obstacles or dense vegetation on the building’s roof can distort the geometric structure, resulting in misclassification by some neural networks. Additional challenges arise when classifying training height profiles from roof types with similar shapes or subtle geometric differences, which increases interclass similarity and makes them harder to distinguish. The performance of most neural networks is also sensitive to the quality of the input data, the network architecture, and the selection of appropriate hyperparameters in each configuration.

5. Conclusions

In this study, we presented a novel approach to automatically generate, classify, and detect building roof types using height profiles derived from high-resolution digital surface models. Three methods were developed for roof type detection based on the height profiles of building cross-sections: (1) the binary image-based method, (2) the two-spectral band image-based method, and (3) the vector-based method. Depending on each method, the height profiles were stored as two binary images, an image with two spectral bands, or as two 1D vectors, respectively. A variety of convolutional neural networks (CNNs) were trained and evaluated. Seven 2DCNNs—InceptionV3, EfficientNetB0, EfficientNetB7, InceptionResNetV2, ResNet50, MobileNet, and DenseNet201—were employed to extract features and perform classification tasks for the binary and two-spectral band image-based methods. A 1DCNN was used for the vector-based method. The optimal hyperparameters for each network were determined by minimizing the cost function on the validation dataset.

Each method addressed a classification task. In the binary image-based method, binary images of height profiles were used as input to classify cross-section geometries into five categories (quadrilateral, convex quadrilateral, pentagonal, trapezoidal, and complex). A decision algorithm then used the two estimated geometries to determine the roof type. In the two-spectral band and vector-based methods, the inputs were either an image combining two height profiles or two 1D vectors, respectively, and the outputs directly corresponded to one of seven roof types: flat, shed, gable, pinnacle, hip, mansard, or combined. The results demonstrated that DenseNet201, when applied in the two-spectral band image-based method, achieved the highest performance with an accuracy of 97%. This network performed well on unseen data from another city and exhibited stability in the presence of roof complications and noise. It could accurately extract features, resolve interclass similarities in geometric shapes, and detect building roof types even under significant distortion in height profile geometry conditions.

Future work will focus on expanding the dataset by including more diverse and challenging data to train the networks for different elevation changes, thereby increasing their generalizability. Additionally, to overcome challenges related to the similarity between classes in height profile classification, using multiple height profiles per building could improve the accuracy of roof type detection. Alternative neural network architectures and other optimization techniques, such as AdamW, can also be investigated to improve the accuracy of building roof type detection based on height profiles. Since the proposed method is a supervised classification approach that relies on labeled samples for each class, although the training data were generated automatically, the manual labeling process remains time-consuming. Therefore, future research could explore unsupervised or semi-supervised learning methods to reduce the dependency on labeled training data and increase automation.

Author Contributions

Conceptualization, B.F. and H.A.; Methodology, B.F. and H.A.; Software, B.F.; Validation, B.F.; Formal analysis, B.F. and H.A.; Investigation, B.F. and H.A.; Resources, B.F.; Data curation, B.F.; Writing—original draft, B.F.; Writing—review & editing, B.F. and H.A.; Visualization, B.F.; Supervision, H.A.; Project administration, H.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Farajelahi, B.; Najaf, M.; Arefi, H. Comparing the Performance of Roof Segmentation Methods in an Urban Environment Using Digital Elevation Data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 165–170. [Google Scholar] [CrossRef]
Wichmann, A.; Jung, J.; Sohn, G.; Kada, M.; Ehlers, M. Integration of Building Knowledge into Binary Space Partitioning for the Reconstruction of Regularized Building Models. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 2, 541–548. [Google Scholar] [CrossRef]
Sahebdivani, S.; Arefi, H.; Maboudi, M. Rail Track Detection and Projection-Based 3D Modeling from UAV Point Cloud. Sensors 2020, 20, 5220. [Google Scholar] [CrossRef] [PubMed]
Hu, P.; Miao, Y.; Hou, M. Reconstruction of Complex Roof Semantic Structures from 3D Point Clouds Using Local Convexity and Consistency. Remote Sens. 2021, 13, 1946. [Google Scholar] [CrossRef]
Partovi, T.; Krauß, T.; Arefi, H.; Omidalizarandi, M.; Reinartz, P. Model-Driven 3D Building Reconstruction Based on Integration of DSM and Spectral Information of Satellite Images. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 3168–3171. [Google Scholar]
Li, L.; Yao, J.; Tu, J.; Liu, X.; Li, Y.; Guo, L. Roof Plane Segmentation from Airborne LiDAR Data Using Hierarchical Clustering and Boundary Relabeling. Remote Sens. 2020, 12, 1363. [Google Scholar] [CrossRef]
Albano, R. Investigation on Roof Segmentation for 3D Building Reconstruction from Aerial LIDAR Point Clouds. Appl. Sci. 2019, 9, 4674. [Google Scholar] [CrossRef]
Sallab, A.E.; Sobh, I.; Zahran, M.; Essam, N. LiDAR Sensor Modeling and Data Augmentation with GANs for Autonomous Driving. arXiv 2019, arXiv:1905.07290. [Google Scholar] [CrossRef]
Li, Y.; Wu, B. Automatic 3D Reconstruction of Complex Buildings from Incomplete Point Clouds with Topological-Relation Constraints. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 5, 85–92. [Google Scholar] [CrossRef]
Boltcheva, D.; Basselin, J.; Poull, C.; Barthélemy, H.; Sokolov, D. Topological-Based Roof Modeling from 3D Point Clouds. J. WSCG 2020, 28, 137–146. [Google Scholar] [CrossRef]
Rottensteiner, F.; Trinder, J.; Clode, S.; Kubik, K.; Lovell, B. Building Detection by Dempster-Shafer Fusion of LIDAR Data and Multispectral Aerial Imagery. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, 26 August 2004; Volume 2, pp. 339–342. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
Xie, X.; Lang, C.; Miao, S.; Cheng, G.; Li, K.; Han, J. Mutual-Assistance Learning for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15171–15184. [Google Scholar] [CrossRef] [PubMed]
Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv 2017. [Google Scholar] [CrossRef]
Vosselman, G.; Maas, H.-G. Airborne and Terrestrial Laser Scanning; Whittles Publishing: Dunbeath, UK, 2010; ISBN 978-1-904445-87-6. [Google Scholar]
Yang, Q.; Wu, X. 10 Challenging Problems in Data Mining Research. Int. J. Inf. Technol. Decis. Mak. 2006, 05, 597–604. [Google Scholar] [CrossRef]
ACM Computing Surveys. Time-Series Data Mining. Available online: https://dl.acm.org/doi/10.1145/2379776.2379788 (accessed on 17 August 2023).
Bagnall, A.; Lines, J.; Bostrom, A.; Large, J.; Keogh, E. The Great Time Series Classification Bake off: A Review and Experimental Evaluation of Recent Algorithmic Advances. Data Min. Knowl. Discov. 2017, 31, 606–660. [Google Scholar] [CrossRef] [PubMed]
Krastev, P.G.; Gill, K.; Villar, V.A.; Berger, E. Detection and Parameter Estimation of Gravitational Waves from Binary Neutron-Star Mergers in Real LIGO Data Using Deep Learning. Phys. Lett. B 2021, 815, 136161. [Google Scholar] [CrossRef]
Huang, C.-Y.; Dzulfikri, Z. Stamping Monitoring by Using an Adaptive 1D Convolutional Neural Network. Sensors 2021, 21, 262. [Google Scholar] [CrossRef] [PubMed]
Lee, J.-H.; Kang, J.; Shim, W.; Chung, H.-S.; Sung, T.-E. Pattern Detection Model Using a Deep Learning Algorithm for Power Data Analysis in Abnormal Conditions. Electronics 2020, 9, 1140. [Google Scholar] [CrossRef]
Wang, S.; Wang, H.; Zhou, Y.; Liu, J.; Dai, P.; Du, X.; Abdel Wahab, M. Automatic Laser Profile Recognition and Fast Tracking for Structured Light Measurement Using Deep Learning and Template Matching. Measurement 2021, 169, 108362. [Google Scholar] [CrossRef]
Harp, G.R.; Richards, J.; Tarter, S.S.J.C.; Mackintosh, G.; Scargle, J.D.; Henze, C.; Nelson, B.; Cox, G.A.; Egly, S.; Vinodababu, S.; et al. Machine Vision and Deep Learning for Classification of Radio SETI Signals. arXiv 2019. [Google Scholar] [CrossRef]
Mohebi, E.; Bagirov, A. A Convolutional Recursive Modified Self Organizing Map for Handwritten Digits Recognition. Neural Netw. 2014, 60, 104–118. [Google Scholar] [CrossRef] [PubMed]
Boukharouba, A.; Bennia, A. Novel Feature Extraction Technique for the Recognition of Handwritten Digits. Appl. Comput. Inform. 2017, 13, 19–26. [Google Scholar] [CrossRef]
Baker, N.; Lu, H.; Erlikhman, G.; Kellman, P.J. Deep Convolutional Networks Do Not Classify Based on Global Object Shape. PLOS Comput. Biol. 2018, 14, e1006613. [Google Scholar] [CrossRef] [PubMed]
Ghosh, A.; Mukherjee, A.; Ghosh, C. Simplistic Deep Learning for Japanese Handwritten Digit Recognition. In Science and Technology, Proceedings of the Intelligent Techniques and Applications, Siliguri, India, 20–21 September 2019; Dawn, S., Balas, V.E., Esposito, A., Gope, S., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 87–93. [Google Scholar]
Ali, S.; Shaukat, Z.; Azeem, M.; Sakhawat, Z.; Mahmood, T.; ur Rehman, K. An Efficient and Improved Scheme for Handwritten Digit Recognition Based on Convolutional Neural Network. SN Appl. Sci. 2019, 1, 1125. [Google Scholar] [CrossRef]
Kalfas, I.; Vinken, K.; Vogels, R. Representations of Regular and Irregular Shapes by Deep Convolutional Neural Networks, Monkey Inferotemporal Neurons and Human Judgments. PLOS Comput. Biol. 2018, 14, e1006557. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Qi, J.; Wan, P.; Wang, H.; Xie, D.; Wang, X.; Yan, G. An Easy-to-Use Airborne LiDAR Data Filtering Method Based on Cloth Simulation. Remote Sens. 2016, 8, 501. [Google Scholar] [CrossRef]
Arefi, H.; Hahn, M.; Reinartz, P. Ridge Based Decomposition of Complex Buildings for 3D Model Generation from High Resolution Digital Surface Models. In Proceedings of the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Istanbul, Turkey, 11–13 October 2010; Volume 34. [Google Scholar]
Abu-Ain, W.; Abdullah, S.N.H.S.; Bataineh, B.; Abu-Ain, T.; Omar, K. Skeletonization Algorithm for Binary Images. Procedia Technol. 2013, 11, 704–709. [Google Scholar] [CrossRef]
Al-Jawhar, Y.A.; Ramli, K.N.; Taher, M.A.; Mohd Shah, N.S.; Audah, L.; Ahmed, M.S. Zero-Padding Techniques in OFDM Systems. Int. J. Electr. Eng. Inform. 2018, 10, 704–725. [Google Scholar] [CrossRef]
2D Semantic Labeling Contest—Vaihingen. Available online: https://seafile.projekt.uni-hannover.de/f/6a06a837b1f349cfa749/ (accessed on 7 September 2021).
2D Semantic Labeling Contest—Potsdam. Available online: https://seafile.projekt.uni-hannover.de/f/429be50cc79d423ab6c4/ (accessed on 7 September 2021).
Rottensteiner, F.; Sohn, G.; Jung, J.; Gerke, M.; Baillard, C.; Benitez, S.; Breitkopf, U. The ISPRS Benchmark on Urban Object Classification and 3D Building Reconstruction. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, I-3, 293–298. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020. [Google Scholar] [CrossRef]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016, arXiv:1602.07261. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2016, arXiv:1608.06993. [Google Scholar] [CrossRef]
Hoens, T.R.; Chawla, N.V. Imbalanced Datasets: From Sampling to Classifiers. In Imbalanced Learning; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2013; pp. 43–59. ISBN 978-1-118-64610-6. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
Japkowicz, N.; Stephen, S. The Class Imbalance Problem: A Systematic Study1. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques; Elsevier: Amsterdam, The Netherlands, 2011; ISBN 978-0-12-381480-7. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]

Figure 1. Roofs model library: (a) flat; (b) shed; (c) gable; (d) pinnacle; (e) hip; (f) mansard; (g) combined.

Figure 2. The main steps of the proposed method.

Figure 3. The flowchart of data preprocessing, showing (a) Input LiDAR point cloud; (b) non-building point cloud; (c) generation of a normalized digital surface model (nDSM); (d) extraction of local maxima; (e) ridge line extraction; (f) extraction of sub-nDSM bounding boxes; and (g) examples of sub-nDSM segments.

Figure 4. The flowchart of the single building extraction process. (a) sub-nDSM. (b) null image. (c) mask image. (d) building border representation on the mask image. (e) labeled buildings in the mask image. (f) central building extraction. (g) mask image of the central building. (h) final sub-nDSM containing only the target building.

Figure 5. Visualization of the sub-nDSM. (1) projection of the point cloud onto the XOZ plane before rotation. (2) projection of the point cloud onto the XOZ plane after applying the rotation angle.

Figure 6. The flowchart of height profile generation: (a) sub-nDSM; (b) oriented sub-nDSM; (c) first and second height profiles in binary image format; (d) height profiles in a two-spectral band image format; (e) height profiles as two 1D vectors.

Figure 7. Order and geometries of height profiles for various roof types: (1) flat. (2) shed. (3) gable. (4) pinnacle. (5) hip. (6) mansard. (a) the first cross-section. (b) the second cross-section.

Figure 8. Geometry type library for the binary image-based method: (1) quadrilateral. (2) convex quadrilateral. (3) pentagonal. (4) trapezoidal. (5) complex geometries.

Figure 9. Roof type library for the two-spectral band image-based method: (1) flat. (2) shed. (3) gable. (4) pinnacle. (5) hip. (6) mansard. (7) combined.

Figure 10. Roof type library for the vector-based method: (1) flat. (2) shed. (3) gable. (4) pinnacle. (5) hip. (6) mansard. (7) combined.

Figure 11. Presentation of the sub-nDSM of a gable roof. (a) non-oriented sub-nDSM. (b) oriented sub-nDSM by the major orientation angle. (c) oriented sub-nDSM by both major and minor orientation angles.

Figure 12. Three types of height profiles with pentagonal geometry: (a) pentagonal geometry without applying a rotation angle. (b) pentagonal geometry by applying the major rotation angle. (c) pentagonal geometry by applying the major and minor rotation angles.

Figure 13. Building roof type detection using height profiles.

Figure 14. Flowchart of roof type detection using binary image-based method: (a) quadrilateral. (b) convex quadrilateral. (c) pentagonal. (d) trapezoidal. (e) complex geometries.

Figure 15. Flowchart of roof type detection using two-spectral image-based method: (a) flat. (b) shed. (c) gable. (d) pinnacle. (e) hip. (f) mansard. (g) combined roofs.

Figure 16. Flowchart of roof type detection using vector-based method: (a) flat. (b) shed. (c) gable. (d) pinnacle. (e) hip. (f) mansard. (g) combined roofs.

Figure 17. Study dataset: (a) a region of Vaihingen. (b) a region of Potsdam.

Figure 18. The archtiecture of the 1DCNN.

Figure 19. View of (a) orthophoto image, (b) sub-nDSM, (c) height profiles of the first cross-section in binary image format, (d) height profiles of the second cross-section in binary image format, (e) height profiles of the cross-sections in two-spectral band image format, and (f) height profiles of the cross-sections in two 1D vectors formats. (1) with complications and noise. (2) without complications and noise.

Figure 20. Classification of seven roof types by DenseNet201.

Table 1. Hyperparameters used for the 2DCNN in the binary image-based method.

2DCNN	Optimizer	Learning Rate	Batch Size	Loss	Processing Time
EfficientNet-B0	ADAM	0.01	16	0.540	4 h 39 m 53 s
EfficientNet-B7	ADAM	0.001	16	0.550	18 h 09 m 20 s
DenseNet-201	ADAM	0.01	16	0.443	8 h 07 m 10 s
MobileNet	ADAM	0.01	16	0.493	1 h 32 m 9 s
Inception-V3	SGD	0.01	16	0.480	4 h 15 m 16 s
ResNet-50	RMSprop	0.001	16	0.468	4 h 02 m 23 s
InceptionResNetV2	RMSprop	0.001	16	0.450	9 h 20 m 42 s

Table 2. Hyperparameters used for the 2DCNN in the two-spectral band image-based method.

2DCNN	Optimizer	Learning Rate	Batch Size	Loss	Processing Time
EfficientNet-B0	ADAM	0.01	16	0.301	6 h 13 m 36 s
EfficientNet-B7	ADAM	0.001	16	0.345	24 h 02 m 50 s
DenseNet-201	RMSprop	0.001	16	0.241	11 h 58 m 49 s
MobileNet	ADAM	0.01	16	0.292	2 h 07 m 30 s
Inception-V3	ADAM	0.001	16	0.256	5 h 41 m 21 s
ResNet-50	ADAM	0.001	16	0.288	5 h 30 m 18 s
InceptionResNetV2	ADAM	0.001	16	0.233	12 h 31 m 31 s

Table 3. Hyperparameters used for the 1DCNN in the vector-based method.

1DCNN	Optimizer	Learning Rate	Batch Size	Loss	Kernel Size	Activation Function	Processing Time
	ADAM	0.001	32	0.890	9	ReLU	5 h 17 m 8 s

Table 4. Average results of evaluation criteria on validation data.

Methods	Neural Networks	Precision (%)	Accuracy (%)
Binary Image-Based Method	EfficientNet-B0	78.45	93.74
	EfficientNet-B7	77.24	93.7
	DenseNet-201	79.01	93.94
	MobileNet	80.31	94.38
	Inception-V3	78.53	94.05
	ResNet-50	78.48	93.79
	InceptionResNet-V2	80.02	94.4
Two-Spectral Band Image-Based Method	EfficientNet-B0	86.2	96.9
	EfficientNet-B7	85.28	96.46
	DenseNet-201	88.69	97.3
	MobileNet	88.21	97.35
	Inception-V3	87.14	97.01
	ResNet-50	84.45	96.39
	InceptionResNet-V2	88.03	97.25
Vector-Based Method	1DCNN	70.77	92.79

Table 5. Average results of the binary image-based method on the Vaihingen test dataset.

2DCNN	Roof Type				Geometry Type
2DCNN	Precision (%)	Recall (%)	Accuracy (%)	F1-Score (%)	IoU (%)
EfficientNet-B0	80.79	75.19	93.05	75.89	72.21
EfficientNet-B7	87.62	78.18	96.14	81.90	80.59
DenseNet-201	83.54	79.83	94.21	80.72	83.57
MobileNet	83.14	77.23	94.21	78.65	71.63
Inception-V3	82.82	79.12	94.21	80.16	77.95
ResNet-50	85.24	77.23	94.59	79.30	70.75
InceptionResNet-V2	74.61	65.32	91.89	67.69	62.86

Table 6. Average results of the two-spectral band image-based method on the Vaihingen test dataset.

2DCNN	Roof Type
2DCNN	Precision (%)	Recall (%)	Accuracy (%)	F1-Score (%)	IoU (%)
EfficientNet-B0	75.80	74.47	92.28	73.85	59.86
EfficientNet-B7	92.38	84.86	96.91	85.53	77.24
DenseNet-201	86.83	86.51	96.14	85.86	76.39
MobileNet	82.62	82.69	94.60	81.64	72.38
Inception-V3	87.70	87.59	96.91	87.15	79.09
ResNet-50	86.05	86.05	96.14	85.16	75.95
InceptionResNet-V2	82.26	82.13	95.37	81.68	70.50

Table 7. Average results of the vector-based method on the Vaihingen test dataset.

1DCNN	Roof Type
1DCNN	Precision (%)	Recall (%)	Accuracy (%)	F1-Score (%)	IoU (%)
	84.52	84.73	95.37	83.74	74.01

Table 8. Average results of the binary image-based method on the Potsdam test dataset.

2DCNN	Roof Type								Geometry Type
	Precision (%)		Recall (%)		Accuracy (%)		F1-Score (%)		IoU (%)
	Before	After	Before	After	Before	After	Before	After	Before	After
EfficientNet B0	75	66.67	62.50	81.25	61.11	72.23	53.34	63.46	54.86	87.50
EfficientNet B7	50	66.67	50	81.25	50	72.23	66.67	63.46	33.33	89.58
DenseNet201	75	66.67	56.25	81.25	55.56	72.23	44.45	63.46	62.50	89.58
MobileNet	66.67	66.67	68.75	87.5	61.11	77.78	52.28	67.86	64.58	100
InceptionV3	66.67	66.67	68.75	87.5	61.11	77.78	52.28	67.86	64.58	91.67
ResNet50	75	66.67	56.25	81.25	55.56	72.23	44.45	63.46	52.78	89.58
InceptionResNetV2	75	66.67	62.50	81.25	61.11	72.23	53.34	63.46	81.25	89.58

Table 9. Average results of the two-spectral band image-based method on the Potsdam test dataset.

2DCNN	Roof Type
	Precision (%)		Recall (%)		Accuracy (%)		F1-Score (%)		IoU (%)
	Before	After	Before	After	Before	After	Before	After	Before	After
EfficientNet B0	100	100	68.75	93.75	72.22	94.45	77.28	96.67	68.75	93.75
EfficientNet B7	100	100	68.75	93.75	72.22	94.45	77.28	96.67	68.75	93.75
DenseNet201	100	100	100	100	100	100	100	100	100	100
MobileNet	100	100	68.75	100	72.22	100	77.28	100	68.75	100
InceptionV3	100	100	75	100	77.78	100	83.34	100	75	100
ResNet50	100	100	75	75	77.78	77.78	83.34	83.34	75	75
InceptionResNetV2	100	100	81.25	93.75	83.34	94.45	88.46	96.67	81.25	93.75

Table 10. Average results of the vector-based method on the Potsdam test dataset.

Roof Type
Precision (%)		Recall (%)		Accuracy (%)		F1-Score (%)		IoU (%)
Before	After	Before	After	Before	Before	After	Before	After	Before
50	50	50	50	55.56	50	50	50	50	55.56

Table 11. Average results of the Densenet201 on the test dataset, separated by each class.

Dataset	Roof Type	TP	TN	FP	FN	IoU (%)	Precision (%)	Recall (%)	Accuracy (%)	F1-Score (%)
Vaihingen Test Dataset	Flat	4	32	1	0	80	80	100	97.30	88.89
	Shed	4	32	0	1	80	100	80	97.30	88.89
	Gable	5	32	0	0	100	100	100	100	100
	Pinnacle	5	30	0	2	71.43	100	71.43	94.59	83.33
	Hip	7	27	2	1	70	77.78	87.50	91.89	82.35
	Mansard	2	33	1	1	50	66.67	66.67	94.59	66.67
	Combined	5	31	1	0	83.33	83.33	100	97.30	90.91
Potsdam Test Dataset	Gable	1	8	0	0	100	100	100	100	100
Potsdam Test Dataset	Hip	8	1	0	0	100	100	100	100	100
Average						81.64	89.75	89.51	97	89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Farajelahi, B.; Arefi, H. Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models. Remote Sens. 2025, 17, 2496. https://doi.org/10.3390/rs17142496

AMA Style

Farajelahi B, Arefi H. Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models. Remote Sensing. 2025; 17(14):2496. https://doi.org/10.3390/rs17142496

Chicago/Turabian Style

Farajelahi, Behaeen, and Hossein Arefi. 2025. "Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models" Remote Sensing 17, no. 14: 2496. https://doi.org/10.3390/rs17142496

APA Style

Farajelahi, B., & Arefi, H. (2025). Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models. Remote Sensing, 17(14), 2496. https://doi.org/10.3390/rs17142496

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Profile-Based Building Detection Using Convolutional Neural Network and High-Resolution Digital Surface Models

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Data Preprocessing

3.2. Single Building Exatraction

3.3. Height Profile Generation

3.4. Roof Type Detection

4. Experimental Results and Discussion

4.1. Description of the Datasets

4.2. CNN Training

4.3. Evaluation Metrics

4.4. Investigating the Eeffect of Tuning Hyperparameters

4.5. Results of Building Roof Type Detection

4.5.1. Evaluating Neural Network Performance in Building Roof Type Detection

4.5.2. Neural Network Performance with New Data, Impact of Noise, and Roof Complications

4.5.3. Optimal Network Selection for Building Roof Detection Using Height Profiles

4.6. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI