Building Extraction Using Orthophotos and Dense Point Cloud Derived from Visual Band Aerial Imagery Based on Machine Learning and Segmentation

Urban sprawl related increase of built-in areas requires reliable monitoring methods and remote sensing can be an efficient technique. Aerial surveys, with high spatial resolution, provide detailed data for building monitoring, but archive images usually have only visible bands. We aimed to reveal the efficiency of visible orthophotographs and photogrammetric dense point clouds in building detection with segmentation-based machine learning (with five algorithms) using visible bands, texture information, and spectral and morphometric indices in different variable sets. Usually random forest (RF) had the best (99.8%) and partial least squares the worst overall accuracy (~60%). We found that >95% accuracy can be gained even in class level. Recursive feature elimination (RFE) was an efficient variable selection tool, its result with six variables was like when we applied all the available 31 variables. Morphometric indices had 82% producer’s and 85% user’s Accuracy (PA and UA, respectively) and combining them with spectral and texture indices, it had the largest contribution in the improvement. However, morphometric indices are not always available but by adding texture and spectral indices to red-green-blue (RGB) bands the PA improved with 12% and the UA with 6%. Building extraction from visual aerial surveys can be accurate, and archive images can be involved in the time series of a monitoring.


Introduction
Cities dynamically change with time regarding their area and appearance due to accelerated urbanization, in high accordance with the increasing population of the world [1]. The ratio of urban population was 55% in 2018; it has been almost doubled since 1960 [2]. The trend shows linear growth according to the forecasts: urban population will be 68% in 2050 and 85% in 2100 [3]. This process is a key factor of accelerated urban sprawl, the pace of building construction, and the degree of increase of returning laser pulses: first and last echo, intensity, point density, and geometry attributes [7,34,58,59]. Building detection also can be performed with aerial images; the capabilities of digital image processing are widely used to analyze urban areas, using image thresholding, filtering methods, and morphological operations [60]. RGB range is often transformed into other color spaces [6,61] by converting or normalizing pixel intensity values of raster bands, then combining them to enhance the appearance of building objects and analyze the orthophotos in spectral term. Many building extraction workflows applied the shape and geometry information to determine the location of buildings. Edge filtering methods help to extract shape information to detect the changes of pixel intensities caused by differing relative heights [31]; thus, the edges of the building objects are clearly detectable in not shaded areas. Several studies include this approach, calculating feature attributes assigned to segments or using pixel-wise methods defining land cover categories, processed with machine learning (ML) and statistical modeling methods [26,32]. In most cases, the obtained building objects are converted into vector data which require noise reduction and generalizing and smoothing techniques to gain the building footprints [25] (i.e., the outline of the roof).
Although a large number of publications have focused on the classification of the urban environment using aerial images, a comprehensive comparison of the effect of different types of input data and classification algorithms on accuracy has not been conducted. Accordingly, our aim was to perform a land cover classification focusing on building detection based on RGB aerial photographs using ML classification. We performed the analysis on five levels: (i) using only spectral data (RGB), (ii) adding textural features, (iii) adding morphometric indices derived from the DSM, (iv) adding visual band (i.e., RGB) spectral indices, and (v) different combinations of the spectral, textural, and DSM-related data. We tested the classification performance of five ML algorithms. Our aim was to reveal the best set of variables suitable for an urban area to efficiently extract buildings applying the data fusion approach and to achieve the most accurate classification.

Study Area
The study area was located in the north-western part of Debrecen, the second largest city of Hungary ( Figure 1). It is a suburban area characterized by detached and terraced houses and some blocks of flats on 76 ha built in the last 40 years. The area is at the NW edge of the city; thus, air quality is good and free of traffic-related pollution; furthermore, there are no industrial pollution sources. The degree of built-up can be described with the high density of buildings and due to the dense vegetation, several buildings are partially covered by foliage. A wide variety of roofing materials can be identified (different colors of concrete and bitumen shingles, asbestos cement, metal roof, etc.), in many cases combined with solar panels [10]. Therefore, there were lots of factors affecting the accuracy of classifications: covering foliage of trees, color and aging of the roofing materials, roof windows, and solar panels and due to clean air, roofs often had a relevant amount of lichens and mosses.

Data
Nadir aerial images, captured by Envirosense Ltd. in 2017 August, were used to generate a dense point cloud and an orthomosaic. The image acquisition was processed with a 60 MP resolution Leica RCD30 camera. Although a near-infrared band was available, we used only the RGB (red, green and blue) bands because we aimed the explore the possibilities of archive aerial photos having only visual bands. We applied the SfM technique to generate a photogrammetric point cloud. During the photogrammetric process, we used a high-level settings parameter, with reference preselection to image alignment processing, in order to keep the original size of the raw aerial images and to achieve a more detailed result. In dense point cloud generation, we set the 'high quality' reconstruction parameter to use all of image pixels to generate the most accurate geometry. Furthermore, we applied middle-leveled outlier point filtering to eliminate outlier points [62]. The point density of the point cloud was 44.38 points/m 2 . GPS (Global Positioning System) reference data and camera calibration parameters were also available contained in images. We refined the accuracy of the point cloud with 11 ground control point (GCP) markers measured with a Stonex S9 RTK GPS to optimize the generated point cloud. RGB values were assigned to each point of the photogrammetric point cloud. Since the point density was relatively high, we applied the TIN (Triangulated Irregular Network) interpolation procedure (Delaunay triangulation surface creation with triangular facets [63]), then raster generation with natural neighbor rasterization (smooth terrain surface generation using areabased weighting [64]) using 0.1 m pixel resolution to create a digital surface model (DSM).

Point Cloud Classification and Derivation of Morphometric Variables
We conducted point cloud classification using the cloth simulation filter (CSF) [65] to separate ground and non-ground points of the point cloud as a key step of the DTM creation. As a first step, a manual outlier detection was performed to filter out the vertical outliers (we removed the points

Data
Nadir aerial images, captured by Envirosense Ltd. in 2017 August, were used to generate a dense point cloud and an orthomosaic. The image acquisition was processed with a 60 MP resolution Leica RCD30 camera. Although a near-infrared band was available, we used only the RGB (red, green and blue) bands because we aimed the explore the possibilities of archive aerial photos having only visual bands. We applied the SfM technique to generate a photogrammetric point cloud. During the photogrammetric process, we used a high-level settings parameter, with reference preselection to image alignment processing, in order to keep the original size of the raw aerial images and to achieve a more detailed result. In dense point cloud generation, we set the 'high quality' reconstruction parameter to use all of image pixels to generate the most accurate geometry. Furthermore, we applied middle-leveled outlier point filtering to eliminate outlier points [62]. The point density of the point cloud was 44.38 points/m 2 . GPS (Global Positioning System) reference data and camera calibration parameters were also available contained in images. We refined the accuracy of the point cloud with 11 ground control point (GCP) markers measured with a Stonex S9 RTK GPS to optimize the generated point cloud. RGB values were assigned to each point of the photogrammetric point cloud. Since the point density was relatively high, we applied the TIN (Triangulated Irregular Network) interpolation procedure (Delaunay triangulation surface creation with triangular facets [63]), then raster generation with natural neighbor rasterization (smooth terrain surface generation using area-based weighting [64]) using 0.1 m pixel resolution to create a digital surface model (DSM).

Point Cloud Classification and Derivation of Morphometric Variables
We conducted point cloud classification using the cloth simulation filter (CSF) [65] to separate ground and non-ground points of the point cloud as a key step of the DTM creation. As a first Remote Sens. 2020, 12, 2397 6 of 28 step, a manual outlier detection was performed to filter out the vertical outliers (we removed the points having lower height values than the ground surface) prior to the classification process. These outlier points over-represented the elevation range of the area; thus, this method helped to avoid false shifts of the terrain. The parameters were chosen by the "trial and error" method excluding the inappropriate solutions by visual interpretation and the 11 GCP points. Finally, the CSF was parameterized with a cloth size of 0.5 m, threshold of 1.0 m and the terrain scene was set to "flat" (in accordance with the plain characteristics of the study area), which ensured to keep only the ground points. The outcome was a smooth and refined terrain surface without rough "spikes". A DTM was generated with TIN-based interpolation technique [63] and natural neighbor rasterization [64] using ground points with a resolution of 0.1 m. Subtracting the DTM raster layer from the DSM we obtained the nDSM layer, a database of the relative height of the terrain objects. Furthermore, morphometric indices (slope and aspect) were derived from the DSM.
Based on the photogrammetric point cloud we obtained the normal vectors (Nx, Ny) for each point in XY directions as point features [66], since the set of similar normal vector indicates a homogeneous surface [67] (e.g., planar segments of building rooftops), whereas various values of normal vectors refer to heterogeneous objects (e.g., vegetation). Using the same interpolation and rasterization procedure as before (TIN-based technique [63] and natural neighbor rasterization [64]) we created two raster layers from the Nx and Ny normal vectors with a pixel resolution of 0.1 m.

RGB Indices
We calculated spectral indices (hereafter RGB indices) from the RGB bands usually referred as pseudo-vegetation indices to help in discriminating non-vegetation objects from vegetation; furthermore, to emphasize the contrast between vegetation and ground [68]. High reflectivity in the visible green band with lower intensity in the red and blue band refers to vegetation [69][70][71]. Inversely, at the non-vegetation points red and blue bands have the highest reflectance (e.g., roads, buildings) [72,73] (Table 1). We created individual raster layers for each index. Table 1. RGB indices calculation using RGB bands (R: red, G: green, B: blue band's pixel intensities).

Texture Information
Textural indices were used in pattern recognition studies to analyze the texture of aerial photos. Obtained features represent the spatial homogeneity and heterogeneity referring to surface quality, the type of objects and land cover. Building rooftop and road network have uniform texture compared with the non-uniformity of the vegetation [80]. Texture is calculated on the basis of statistical distribution of the observed pixel intensity variations in a defined neighborhood. Histogram-based first level statistical metrics (e.g., mean and variance) do not take into account the spatial relationship of the pixel intensity values, as histogram is a graphical representation of data dispersion [81]. Spatial features are considered in the second-order texture metrics, such as Haralick indices, which use the grey-level co-occurrence matrix (GLCM) to compute co-occurring intensity pairs based on two grey-level pixels at a given displacement and at a defined direction [82][83][84][85][86][87]. The scale of the moving window can affect the details of the obtained texture information and the processing time [1,21,87]. The number of pixel intensity levels has an important role in the extraction of image texture; thus, a quantization method is required to reduce the intensity levels [84,85]. There are 14 Haralick textures derived from the GLCM, but only a few of them became popular in remote sensing [21,82,83,[85][86][87]. We calculated the Haralick textural information applying four main directions (0 • , 45 • , 90 • , 135 • ) as an average (isotropic) matrix [1,82,[84][85][86]. Furthermore, we computed textural information derived from the run-length matrix (consecutive connected pixels of the same grey level as run, and the number of pixels in the run as length) [88,89] (Table 2). Then, 4 bit and 8 bit RGB composites were used as input data for the determination of texture information. Radius (kernel) and offset parameters were selected after investigating several settings of radius and shift (2, 3, 5 radius and 1, 3, 5 pixel-offsets). We extracted the values of 1000 random points and then, compared the distributions with an ANOVA test of run percentage by kernel and offset settings. The test was found to be insignificant (F = 2.331, p = 0.06). Finally, the parameterization settings of the texture generation were the following: the histogram generation was set to 8 bins, radius was 2 as the moving window size; the offset value was 1 both in X and Y directions. All of the measurements were generated both on 4 bit and 8 bit form grayscale quantized raster layers. Table 2. Texture information calculation (µ x , µ y , σ x , σ y : means and standard deviations of p x and p y , p(i,j): (i,j) th entry in a normalized gray-tone spatial-dependence matrix and in the given run length matrix, Ng: number of gray levels in the quantized image, Nr: number of run length that occur, P: number of points in the image.

OBIA-Based Segmentation
Our classification was based on a segmentation approach using the orthomosaic raster layer's RGB bands as input data with the OBIA-based seeded region growing algorithm [54,90]. We accepted the fact that using only visual bands we cannot delineate the outline of the objects with single segments; thus, we chose the oversegmentation. Accordingly, we generated small segments and we applied the following settings: band width of 2, neighborhood of 4, and the distance type was set to 'feature space and position' option with the variance of 1. The outcome was an oversegmented vector layer created from homogeneous clusters of pixels ( Figure 2). We determined the mean pixel intensity values to each segment from all raster layers (RGB bands, RGB indices, texture information, and morphometric indices). Next, 200 segments were selected per land cover categories (altogether 1000) as training data (building, vegetation, asphalt, bare soil, others) and the whole urban area was classified based on these classes. However, we evaluated only the building land cover class according to the aims. values to each segment from all raster layers (RGB bands, RGB indices, texture information, and morphometric indices). Next, 200 segments were selected per land cover categories (altogether 1000) as training data (building, vegetation, asphalt, bare soil, others) and the whole urban area was classified based on these classes. However, we evaluated only the building land cover class according to the aims.

Variable Data Sets and Data Preparation
Variables were arranged into different sets. According to our primary aim, to classify urban land cover and to identify buildings from orthophotographs of visible range (RGB), first we used the raw RGB layers. We then tested the RGB indices, the textural information (using 4 bit or 8 bit rasters), and finally the morphometric indices. Next, we combined these index groups and also tested their contribution in image classification. As another approach, we involved all possible variables with original layers and with principal components (PCs) of a principal component analysis (PCA). Furthermore, the recursive feature elimination (RFE) was used to select the most important variables. We applied standardized PCA using the correlation matrix to reduce data dimensionality and to produce non-correlating orthogonal variables (PCs). Varimax rotation was applied. The number of PCs was determined based on Kaiser's rule and the goodness-of-measures (root mean square residuals, RMSR; advanced goodness of fit index, AGFI). All variables were involved in the PCA model, but the correlation textural feature (4 bit and 8 bit) and the raster layers of normal vectors (Nx, Ny) were excluded according to their low communality. PCs were used as input data similarly to original variables. PCA was conducted in R 3.6.2 (R Core Team [91]) with the psych [92] and GPArotation packages [93]. RFE is a variable (feature) selection method used directly with a classification algorithm (e.g., random forest, support vector machine). The main concept is to remove the variables from the input variables having the weakest contribution to the classification accuracy. In the next step another variable is removed, and the procedure lasts until only one variable remains in the input variable set [94]. The result is a ranking of the variables ordered by their importance in maximizing the overall accuracy. RFE was conducted with 10-fold cross-validation with three repetitions in R 3.6.2 [91] with the caret package [95].

Random Forest
Random forest (RF) is a robust classifier algorithm independent of prerequisite of normal distribution or variance heterogeneity [96]. The main concept is using several decision trees (usually

Variable Data Sets and Data Preparation
Variables were arranged into different sets. According to our primary aim, to classify urban land cover and to identify buildings from orthophotographs of visible range (RGB), first we used the raw RGB layers. We then tested the RGB indices, the textural information (using 4 bit or 8 bit rasters), and finally the morphometric indices. Next, we combined these index groups and also tested their contribution in image classification. As another approach, we involved all possible variables with original layers and with principal components (PCs) of a principal component analysis (PCA). Furthermore, the recursive feature elimination (RFE) was used to select the most important variables. We applied standardized PCA using the correlation matrix to reduce data dimensionality and to produce non-correlating orthogonal variables (PCs). Varimax rotation was applied. The number of PCs was determined based on Kaiser's rule and the goodness-of-measures (root mean square residuals, RMSR; advanced goodness of fit index, AGFI). All variables were involved in the PCA model, but the correlation textural feature (4 bit and 8 bit) and the raster layers of normal vectors (Nx, Ny) were excluded according to their low communality. PCs were used as input data similarly to original variables. PCA was conducted in R 3.6.2 (R Core Team [91]) with the psych [92] and GPArotation packages [93]. RFE is a variable (feature) selection method used directly with a classification algorithm (e.g., random forest, support vector machine). The main concept is to remove the variables from the input variables having the weakest contribution to the classification accuracy. In the next step another variable is removed, and the procedure lasts until only one variable remains in the input variable set [94]. The result is a ranking of the variables ordered by their importance in maximizing the overall accuracy. RFE was conducted with 10-fold cross-validation with three repetitions in R 3.6.2 [91] with the caret package [95].

Random Forest
Random forest (RF) is a robust classifier algorithm independent of prerequisite of normal distribution or variance heterogeneity [96]. The main concept is using several decision trees (usually 100-500) with bootstrapping to generate random sample data for each individual tree (i.e., random selection with replacement), the number of involved variables for a single tree is the square of the total number of variables. Thus, finally the class of a given object is determined as the largest number of "votes" summing the outcomes of the trees. We applied 500 decision trees and the mtry (number of variables at each node of decision trees) were selected using the repeated k-fold cross-validation (RKCV) technique (best value was chosen automatically using 10-fold cross validation with three repetitions; thus, based on 30 models).

Support Vector Machine
Support vector machine (SVM) is also a robust and efficient algorithm. Although it was developed for binary classifications with binary boundary as a support vector classifier [97], now, it is improved to make it suitable to use it in multiclass approach, too [98,99]. The algorithm constructs hyperplanes (i.e., boundaries) in the multidimensional space determined by the input variables with the aim to maximize the distance between the hyperplane and the nearest data point from all classes [100]. SVM is an extension of the support vector classifier where hyperplanes can be non-linear, and the number of classes >2. SVM uses kernels to overcome to issue of non-linearity and users can chose among several solutions (e.g., polynomial, radial). In this study we applied the radial basis kernel.

K-Nearest Neighbor
K-nearest neighbor (KNN) classification is a simple ML technique, which uses similarity among data points concerning distances in the multidimensional space. Classification procedure starts with identifying the most similar k neighbors of the training dataset (where k is defined by the users) along with the assumption that closer observations belong to the same class (distances are determined in the space defined by the input variables). Practically, testing the efficiency of several k-values is reasonable to find the best setting; accordingly, we applied the RKCV method to find the best k-value (based on 30 models).

Multiple Adaptive Regression Splines
Multiple adaptive regression splines (MARS) is a non-parametric adaptive approach developed to multivariate problems. It extends stepwise regression or decision trees, but it handles non-linearity and interactions between the target variable and the predictors [100,101]. Non-linearity is captured by knots as cutpoints. The classification works in two steps: (1) forward pass: a model is calculated involving all variables considering all possible knots; (2) backward pass: the algorithm removes the variables having the least contribution (prunes to optimal number of knots) to gain the best model using general leaving-one-out cross-validation error metrics [102,103]. We applied RKCV to obtain the optimal number of knots and degree (number of interactions) based on 30 models.

Partial Least Squares
Partial least squares (PLS) is optimal when there are many correlating variables and our aim is to obtain the best predictor variables through an ordination approach. However, in this case, unlike PCA, the aim is not to maximize the explained variance, but to maximize the classification accuracy. Variables are aggregated into factors and their contribution depends on the accuracy, number of variables can be the decision of the users or can be automatized [104,105]. We applied the automatic determination of the number of factors with maximizing the overall accuracy (OA) using the RKCV based on 30 models.

Accuracy Assessment
We applied three approaches to determine the accuracy of models. (1) Overall accuracy (OA) was determined with the repeated k-fold cross-validation (RKCV) technique with 3 repetitions and 10 folds. Reference dataset was split randomly into 10 subsets and while 9 were used for training the model, 1 was used for testing; then, in the next step, another 9 subsets were used for training data and another one for testing. The procedure ended when all subsets were used as a test [110,111]. Then, the random sampling and splitting was repeated three times. Altogether we had 30 models and OA values, thus, we were able to determine a minimum, a maximum, a mean, a median, and the quartiles for each model. This approach helps to handle the question of representativeness of the reference data and to provide a more reliable output, referring to the uncertainties (i.e., what is the effect of having different reference datasets) of the models instead of calculating a single OA using the whole reference dataset at the same time. Models of low minimums, or large interquartile ranges indicate unreliable model solutions. Although we also determined Kappa indices, according to Pontius and Millones [112] we did not report or interpret these measures. (2) Class level metrics of accuracy (user's accuracy, UA; producer's accuracy, PA; F-measure, F1 [113]; intersection over union/Jaccard index, IoU [114]) can be calculated using the traditional confusion matrix. For this purpose, following the recommendation of Congalton [115], we assigned further 50 segments per classes (altogether 250) to calculate UA and PA values. As our main focus was to extract buildings, we calculated UA and PA values for this class and analyzed with statistical techniques (Figure 3). (iii) As independent data we applied the ISPRS (International Society for Photogrammetry and Remote Sensing) Benchmark Dataset of Toronto [116] and conducted the modeling with the RFE-10 and RFE-6 variables as input data for RF classification focusing only on the test area (area #5). We also determined the variable importance (mean decrease accuracy, MDA and mean decrease gini, MDG) based on the RF model. Images were taken by a Microsoft Vexcel's UltraCam-D camera. While our study area was a suburban zone, the ISPRS Dataset was a completely different area both with 300 m high skyscrapers and lower buildings without vegetation.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 28 We applied three approaches to determine the accuracy of models. (1) Overall accuracy (OA) was determined with the repeated k-fold cross-validation (RKCV) technique with 3 repetitions and 10 folds. Reference dataset was split randomly into 10 subsets and while 9 were used for training the model, 1 was used for testing; then, in the next step, another 9 subsets were used for training data and another one for testing. The procedure ended when all subsets were used as a test [110,111]. Then, the random sampling and splitting was repeated three times. Altogether we had 30 models and OA values, thus, we were able to determine a minimum, a maximum, a mean, a median, and the quartiles for each model. This approach helps to handle the question of representativeness of the reference data and to provide a more reliable output, referring to the uncertainties (i.e., what is the effect of having different reference datasets) of the models instead of calculating a single OA using the whole reference dataset at the same time. Models of low minimums, or large interquartile ranges indicate unreliable model solutions. Although we also determined Kappa indices, according to Pontius and Millones [112] we did not report or interpret these measures. (2) Class level metrics of accuracy (user's accuracy, UA; producer's accuracy, PA; F-measure, F1 [113]; intersection over union/Jaccard index, IoU [114]) can be calculated using the traditional confusion matrix. For this purpose, following the recommendation of Congalton [115], we assigned further 50 segments per classes (altogether 250) to calculate UA and PA values. As our main focus was to extract buildings, we calculated UA and PA values for this class and analyzed with statistical techniques (Figure 3). (iii) As independent data we applied the ISPRS (International Society for Photogrammetry and Remote Sensing) Benchmark Dataset of Toronto [116] and conducted the modeling with the RFE-10 and RFE-6 variables as input data for RF classification focusing only on the test area (area #5). We also determined the variable importance (mean decrease accuracy, MDA and mean decrease gini, MDG) based on the RF model. Images were taken by a Microsoft Vexcel's UltraCam-D camera. While our study area was a suburban zone, the ISPRS Dataset was a completely different area both with 300 m high skyscrapers and lower buildings without vegetation.

Results of Data Preparation
PCA resulted in a model explaining 88% of the total variance with five PCs ( Table 3). The model was confirmed by the RMSR (0.03, p < 0.01) and the AGFI (0.99) indicating very good fit. PC1 accounted for 23% variance and correlated mainly with the 4 bit textural information: inverse difference moment, difference entropy, energy, entropy, grey-level non-uniformity, inertia, variance. PC2 accounted also for 23% variance and correlated mainly with the same textural indices in 8 bit form. PC3 accounted for 17% variance correlating with the RGB indices and some of textural indices: normalized green-red difference (NGRDI), red-green-blue vegetation index (RGBVI), green leaf index (GLI), red band from RGB, mean (8 bit), visible atmospherically resistant index (VARI), run percentage (8 bit). PC4 accounted for 15% variance correlating with mean (4 bit), green and blue band from RGB, and run percentage (4 bit). PC5 accounted for 9% and correlated with the morphometric indices derived from DSM (slope, nDSM, and aspect). RFE results showed that from 31 variables 30 was needed to get the highest classification accuracy (99% OA, Figure 4). However, after the first six variables (in importance order: nDSM, RGBVI, GLI, blue band, slope, VARI) the accuracy reached 98.37% OA, and involving the next four (NGRDI, aspect, run percentage as 4 bit and as 8 bit textural information) improved it to 98.93%. Further variables caused smaller improvement than 0.7%; thus, we used two sets of variables from RFE ranking, the first 6 (RFE-6) and 10 (RFE-10) variables. PCA resulted in a model explaining 88% of the total variance with five PCs ( Table 3). The model was confirmed by the RMSR (0.03, p < 0.01) and the AGFI (0.99) indicating very good fit. PC1 accounted for 23% variance and correlated mainly with the 4 bit textural information: inverse difference moment, difference entropy, energy, entropy, grey-level non-uniformity, inertia, variance. PC2 accounted also for 23% variance and correlated mainly with the same textural indices in 8 bit form. PC3 accounted for 17% variance correlating with the RGB indices and some of textural indices: normalized green-red difference (NGRDI), red-green-blue vegetation index (RGBVI), green leaf index (GLI), red band from RGB, mean (8 bit), visible atmospherically resistant index (VARI), run percentage (8 bit). PC4 accounted for 15% variance correlating with mean (4 bit), green and blue band from RGB, and run percentage (4 bit). PC5 accounted for 9% and correlated with the morphometric indices derived from DSM (slope, nDSM, and aspect).  Figure 4). However, after the first six variables (in importance order: nDSM, RGBVI, GLI, blue band, slope, VARI) the accuracy reached 98.37% OA, and involving the next four (NGRDI, aspect, run percentage as 4 bit and as 8 bit textural information) improved it to 98.93%. Further variables caused smaller improvement than 0.7%; thus, we used two sets of variables from RFE ranking, the first 6 (RFE-6) and 10 (RFE-10) variables.

Classification Accuracies Using Different Sets of Input Variables and Classifiers
We ran 19 types of variable sets with five types of classification algorithms ( Figure 5). The first variable set consisted of the original RGB bands which are usually used in most studies. We gained a median of 82% OA using the SVM classifier, which, considering the limited capabilities of the visible spectra in discriminating the land cover, can be considered a good result. Then, using only the RGB indices had worse performance: the median was 77% and also the SVM resulted in the best OA; however, KNN's median was only 0.5%, MARS was 1%, and RF was 2% worse. Textural indices, both 4 and 8 bit versions were able to reach a maximum of 57-58% OA, and from the classifiers MARS was the best in both cases. Morphometric indices provided slightly better OAs, but even the RF's median was 64%. In the next step we started to combine the variable groups, the first in the line was the combination of RGB bands with the RGB indices. It was not a relevant improvement, as the involvement of the RGB indices resulted in the median accuracy of 83% (only 1% larger related to use only the RGB bands). However, morphometric indices combined with RGB bands provided a relevantly better classification, OA reached 97% with the RF, while MARS and SVM models had only slightly worse performance (1% and 1.5%, respectively). RGB and textural indices were a bit less effective together, the median of the accuracies was the highest at the RF classifier (93%) with the 4 bit type and 90% with the 8 bit type ones. If we combined the RGB bands with two types of indices (RGB indices and textural or RGB indices and morphometric indices) the accuracies were always above 90% and the combination of textural indices and morphometric indices resulted in better classifications (above 96-97%). The combination of all sets of variables (RGB bands, RGB, textural and morphometric indices) had the only difference to use all possible variables at the same time that in case of combinations we did not use the 4 and 8 bit textural indices together. The difference in the accuracies were 0.7% (98.5% for the combination and 99.2% when we used all variables). PCA was almost the same regarding the input variables as involving all possible variables, but as it was indicated in the methodology, three variables were excluded from the model; therefore, accuracy was a bit different, it was 97%. Variable selection proved that 10 variables (RFE-10), selected by their importance, can provide almost the same result as the 31 variables. The highest median belonged to MARS and RF classifiers, both with 99%, while with six variables (RFE-6) it was only 1% worse (98% with RF and MARS).
Generally, RF, SVM, and MARS performed the best, and usually RF had the highest OA values regarding the statistical parameters (minimum, median, maximum), but its advantage was usually <1-2%. KNN's median OAs varied: there were two cases when it had the second best performance (in case of RGB bands and RGB indices), but generally, this classifier was the fourth in the line with 2-20% worse than the RF. PLS performance was the worst with >40% OAs related to the RF. We ranked the different variable set combinations by their performance in two ways: using only the RF classifier and based on the classifiers of the best performances (Figure 6a,b). The difference was minimal (PCA had one rank difference and changed place with the RGB + 8 bit textural indices + morphometric indices), because in the cases when the RF was not the best classifier, it was worse only with a few percent (i.e., differences among the variable sets (between groups) were larger than the difference between the classifiers within a variable set (within group)). RF was the best classifier in 12 cases out of the 19 models, while the MARS was the best in four cases and the SVM in three cases. Not only are the medians important but the distributions, too (i.e., considering the minimums, maximums, and the quartiles also provide information about the reliability of the models). Thus, the first six models had the narrowest data ranges, too. Generally, RF, SVM, and MARS performed the best, and usually RF had the highest OA values regarding the statistical parameters (minimum, median, maximum), but its advantage was usually <1-2%. KNN's median OAs varied: there were two cases when it had the second best performance (in case of RGB bands and RGB indices), but generally, this classifier was the fourth in the line with 2-20% worse than the RF. PLS performance was the worst with >40% OAs related to the RF. We ranked the different variable set combinations by their performance in two ways: using only the RF classifier and based on the classifiers of the best performances (Figure 6a,b). The difference was minimal (PCA had one rank difference and changed place with the RGB + 8 bit textural indices + morphometric indices), because in the cases when the RF was not the best classifier, it was worse only with a few percent (i.e., differences among the variable sets (between groups) were larger than the difference between the classifiers within a variable set (within group)). RF was the best classifier in 12 cases out of the 19 models, while the MARS was the best in four cases and the SVM in three cases. Not only are the medians important but the distributions, too (i.e., considering the minimums, maximums, and the quartiles also provide information about the reliability of the models). Thus, the first six models had the narrowest data ranges, too.

Accuracy Assessment on Category Level
We examined the classification accuracies on category level, too, focusing on the buildings. A good classifier can find all the building segments (PA) and does not classify other categories as buildings (UA); therefore, we considered more than one solution as an acceptable outcome having higher accuracy level both for PA and UA. A simple RGB input produced 79% UA and 76% PA, which were among the lowest outcomes considering the possibilities. The lowest performance belonged to solely the texture information (both 4 bit and 8 bit versions); both the UAs and PAs were <65%. The application of RGB indices provided better PA (84%) and slightly (2%) worse UA. Using only morphometric indices had 85% UA and 82% PA, and this classification model was the best among the solutions when we used the different types of indices separately. Texture information partly improved the UA when we combined them with the RGB bands and, at the same time, PA changed only 1% (8 bit) or got worse with 4% (4 bit). Adding the RGB indices or the morphometric indices to RGB bands provided better results: 87% and 94%, and 90% and 94% (UA and PA), respectively. The best solutions, having 95% UAs and PAs were gained when we combined three types of indices with the RGB bands or we involved all possible variables into classification models. Three sets of variables, variable selection with RFE (both with 6 and 10 variables) and the combination of RGB bands, RGB indices, 4 bit texture information, and morphometric indices provided the same accuracy with 100% UA and 98% PA. When we involved both 4 bit and 8 bit versions of texture information, results were 2% worse both for UA and PA (Figure 7).

Accuracy Assessment on Category Level
We examined the classification accuracies on category level, too, focusing on the buildings. A good classifier can find all the building segments (PA) and does not classify other categories as buildings (UA); therefore, we considered more than one solution as an acceptable outcome having higher accuracy level both for PA and UA. A simple RGB input produced 79% UA and 76% PA, which were among the lowest outcomes considering the possibilities. The lowest performance belonged to solely the texture information (both 4 bit and 8 bit versions); both the UAs and PAs were <65%. The application of RGB indices provided better PA (84%) and slightly (2%) worse UA. Using only morphometric indices had 85% UA and 82% PA, and this classification model was the best among the solutions when we used the different types of indices separately. Texture information partly improved the UA when we combined them with the RGB bands and, at the same time, PA changed only 1% (8 bit) or got worse with 4% (4 bit). Adding the RGB indices or the morphometric indices to RGB bands provided better results: 87% and 94%, and 90% and 94% (UA and PA), respectively. The best solutions, having 95% UAs and PAs were gained when we combined three types of indices with the RGB bands or we involved all possible variables into classification models. Three sets of variables, variable selection with RFE (both with 6 and 10 variables) and the combination of RGB bands, RGB indices, 4 bit texture information, and morphometric indices provided the same accuracy with 100% UA and 98% PA. When we involved both 4 bit and 8 bit versions of texture information, results were 2% worse both for UA and PA (Figure 7). Analysis of F1-values also showed that best accuracies can be gained by the same input data as we found with UA and PA (Figure 8). Considering that F1-values were above 95% at seven variable combinations, and four of them were 99% (rgb.t8.d, rgb.i.t4.d, rfe6, and rfe10), there is the possibility to omit some variables, and to use only the most important ones to avoid overfitting. Furthermore, F1 was able to point on the contribution difference between 4 bit and 8 bit texture information: 4 bit (t4) was almost 10% worse using solely as input data. We found that rank order based on IoUs was identical with F1 but values were lower in increasing magnitude in the direction of poor models: differences changed from 1% (in case of first four best models) to 17% (in case of t8). Analysis of F1-values also showed that best accuracies can be gained by the same input data as we found with UA and PA (Figure 8). Considering that F1-values were above 95% at seven variable combinations, and four of them were 99% (rgb.t8.d, rgb.i.t4.d, rfe6, and rfe10), there is the possibility to omit some variables, and to use only the most important ones to avoid overfitting. Furthermore, F1 was able to point on the contribution difference between 4 bit and 8 bit texture information: 4 bit (t4) was almost 10% worse using solely as input data. We found that rank order based on IoUs was identical with F1 but values were lower in increasing magnitude in the direction of poor models: differences changed from 1% (in case of first four best models) to 17% (in case of t8).
Although accuracy assessment provides a quantified tool to find the best model, it is also important to see the outcomes and the maps, too (Figure 9). The visual assessment revealed that using the RGB bands, combining it with RGB indices and/or texture information had the large error of commission (i.e., 1-UA), while 4 and 8 bit texture information had serious error of omission (i.e., 1-PA). Morphometric indices had acceptable outcome, but the real improvement can be gained with more types of variables. When morphometric indices are involved with the input variables, the buildings can be identified with better efficiency. Thus, 8 bit texture information caused commission error even when combined with other types of indices; the visual analysis confirmed that 4 bit versions performed better.
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 28 Although accuracy assessment provides a quantified tool to find the best model, it is also important to see the outcomes and the maps, too (Figure 9). The visual assessment revealed that using the RGB bands, combining it with RGB indices and/or texture information had the large error of commission (i.e., 1-UA), while 4 and 8 bit texture information had serious error of omission (i.e., 1-PA). Morphometric indices had acceptable outcome, but the real improvement can be gained with more types of variables. When morphometric indices are involved with the input variables, the buildings can be identified with better efficiency. Thus, 8 bit texture information caused commission error even when combined with other types of indices; the visual analysis confirmed that 4 bit versions performed better.

Accuracy Assessment with the ISPRS Benchmark Dataset
Applying the RFE-10 and RFE-6 variables on the ISPRS database resulted in a model having 96.2% and 96.0% OAs, respectively. Thus, nDSM, GLI, blue band, slope, VARI, and RGBVI were as much successful as the set of 10 variables. In addition to OAs, we also determined the area-based indices of completeness as the ratio of the modeled area of buildings and the actual area of buildings (88.7%); the ratio of false negative area of the buildings and the actual area of buildings (11.3%), and the ratio of area falsely classified as building and the total non-building area (17.9%) (Figure 10). Variable importance showed that the nDSM and the blue band had the largest contribution (Table 4).

Accuracy Assessment with the ISPRS Benchmark Dataset
Applying the RFE-10 and RFE-6 variables on the ISPRS database resulted in a model having 96.2% and 96.0% OAs, respectively. Thus, nDSM, GLI, blue band, slope, VARI, and RGBVI were as much successful as the set of 10 variables. In addition to OAs, we also determined the area-based indices of completeness as the ratio of the modeled area of buildings and the actual area of buildings (88.7%); the ratio of false negative area of the buildings and the actual area of buildings (11.3%), and the ratio of area falsely classified as building and the total non-building area (17.9%) (Figure 10). Variable importance showed that the nDSM and the blue band had the largest contribution (Table 4).

Discussion
Land cover monitoring is a basic task to follow the changes of the urban environment; however, there is always a trade-off between the classification accuracy and the available imagery regarding the date of the survey, the geometrical resolution, and the number of spectral bands. A common issue is the lack of NIR bands; thus, the discrimination of vegetation, water features, and buildings is only possible with serious classification errors. However, when the only available data is a traditional orthophotograph, we have other options than using only the RGB bands: RGB indices, textural indices and several morphometric indices derived from the DSM (only if the raw images are available and we can conduct a photogrammetric image procession) can improve the accuracy. Our results showed that if we add the appropriate variables to the RGB bands, the accuracy can be high, even up to 99%.
Using the original RGB bands, at least in our study area, resulted a good result with 82% OA, but we have to consider that there can be areas where the accuracy is smaller due to similar land cover categories. Varga et al. [117] found that visible range orthophotos provided only 61-68% OA, depending on the given image; in their image forests, vegetation on arable lands and even the water had green tone causing a large amount of misclassification. Cots-Folch et al. [118] gained 74% OA with the application of textural information. Al-Kofahi et al. [119] applied a similar approach to classify aerial images in urban areas and their gained accuracy was 89%. Xiaoxiao et al. [120] succeeded to reach 94% accuracy on aerial imagery with combining the segmentation and different image processing techniques; however, they involved the NIR band, too. When we used only RGB indices, textural or morphometric indices OA varied: visible vegetation indices (GLI, RGBVI, VARI and NGRDI) performance was worse (77%) than simply using the RGB bands, but textural indices (both 4 bit and 8 bit versions) and morphometric indices provided even lower results (58% and 64%, respectively). However, the combination of the indices brought more accurate maps and the analysis of the possible sets helped to find the optimal set of input variables which guaranteed a reliable classification accuracy.
As our main idea was to explore the possible maximum that can be gained when using only true color orthoimages, we had models including all variables. We revealed that, at least on the level of reference data, even a 99.2% accuracy was available. Application of all variables calculated in this study resulted in 99% OA, but PCA, which usually helps to improve accuracy was slightly worse, at 98% [121,122]. This can be the reason that 99% was originally high, and as some variables were omitted from the PCA model due to low communality, this model lost some accuracy. However, too

Discussion
Land cover monitoring is a basic task to follow the changes of the urban environment; however, there is always a trade-off between the classification accuracy and the available imagery regarding the date of the survey, the geometrical resolution, and the number of spectral bands. A common issue is the lack of NIR bands; thus, the discrimination of vegetation, water features, and buildings is only possible with serious classification errors. However, when the only available data is a traditional orthophotograph, we have other options than using only the RGB bands: RGB indices, textural indices and several morphometric indices derived from the DSM (only if the raw images are available and we can conduct a photogrammetric image procession) can improve the accuracy. Our results showed that if we add the appropriate variables to the RGB bands, the accuracy can be high, even up to 99%.
Using the original RGB bands, at least in our study area, resulted a good result with 82% OA, but we have to consider that there can be areas where the accuracy is smaller due to similar land cover categories. Varga et al. [117] found that visible range orthophotos provided only 61-68% OA, depending on the given image; in their image forests, vegetation on arable lands and even the water had green tone causing a large amount of misclassification. Cots-Folch et al. [118] gained 74% OA with the application of textural information. Al-Kofahi et al. [119] applied a similar approach to classify aerial images in urban areas and their gained accuracy was 89%. Xiaoxiao et al. [120] succeeded to reach 94% accuracy on aerial imagery with combining the segmentation and different image processing techniques; however, they involved the NIR band, too. When we used only RGB indices, textural or morphometric indices OA varied: visible vegetation indices (GLI, RGBVI, VARI and NGRDI) performance was worse (77%) than simply using the RGB bands, but textural indices (both 4 bit and 8 bit versions) and morphometric indices provided even lower results (58% and 64%, respectively). However, the combination of the indices brought more accurate maps and the analysis of the possible sets helped to find the optimal set of input variables which guaranteed a reliable classification accuracy.
As our main idea was to explore the possible maximum that can be gained when using only true color orthoimages, we had models including all variables. We revealed that, at least on the level of reference data, even a 99.2% accuracy was available. Application of all variables calculated in this study resulted in 99% OA, but PCA, which usually helps to improve accuracy was slightly worse, at 98% [121,122]. This can be the reason that 99% was originally high, and as some variables were omitted from the PCA model due to low communality, this model lost some accuracy. However, too many variables raises two issues: the problem of overfitting (results are true only for the given input dataset), and the question of the availability of the possible variables (whether the raw images are available and point cloud and DSM can be produced). The applied variable selection method, the RFE, provided a 6-element variable set where the accuracy was only slightly (1%) worse than including all variables. In other words, the selection revealed that two morphometric indices derived from DSM, the nDSM and slope, three RGB indices (RGBVI, GLI and VARI), and the blue band were enough to reach 98% OA. Adding another spectral index (NGRDI), the aspect and the run percentage textural feature in both the 4 bit and in 8 bit form (i.e., 10 variable set of RFE), rose the OA to 99%, to almost the same level as using all variables. Our suggestion is to use the least number of variables; thus, if the morphometric indices can be calculated, nDSM and slope proved to be important inputs; furthermore, visual-band vegetation indices were also important, and their common usage was reasonable in spite of their correlations (r was between 0.78 and 0.88). In addition, textural information can be omitted, their first appearance in the RFE-rank was only at the ninth place.
Bakula et al. [123] used multispectral LiDAR in an urban environment and found that nDSM (normalized digital surface model, i.e., the height of objects above the terrain) as additional data to optical bands was a key factor of discriminating urban land cover elements. They also applied spectral and textural information layers and from the application purely the spectral bands where kappa was 0.244, finally reached a 0.878 kappa; in our study the worst and best models' kappa was 0.33 (8 bit textural indices, PLS classifier) and 0.99 (including all variables, RF and MARS classifiers). Although our results correspond with it, we emphasize that there were several differences: Bakula et al. [123] had an accurate nDSM based on laser scanned data; furthermore, they applied NIR and SWIR (Short-Wave Infrared) bands, while we used only the true color bands of an aerial survey. However, we were able to obtain similar classification accuracy to a LiDAR-based analysis.
If a DSM is not available and the only possibility is to use the spectral and textural information, the importance of the variables changes, too. RGB indices as additional data did not yield a large increase to the accuracy (82.5%, 0.5% related to use only the RGB bands), but the 4 bit textural indices did (93%, 11% increase to RGB bands). Based on the UA and PA, 8 bit version of the textural indices performed a bit worse than the 4 bit ones, but the difference in the classifications was below 1%. The visual analysis provided further information that 8 bit texture is less reliable and increases the error commission. This was in high accordance with Albregtsen [124], who also suggested the 4 bit version.
We applied five classifiers, of which RF, SVM, and MARS resulted in the best performance with >99% OAs; differences among them were only 1-5%, while OAs of PLS and KNN were even 40% worse. Although all algorithms proved their efficiency in different tasks [102,[125][126][127], we observed that accuracy depended on the similarity of the spectral characteristic of the objects and the input data. Deep learning (DL) algorithms such as artificial neural network (ANN) [128], convolutional neural network (CNN) [129], and recurrent neural network (RNN) [130] became popular and their efficiency can be higher than ML techniques. Comparing DL and ML techniques we find different results. SVM's performance varies (e.g., it outperformed the CNN with 5% using hyperspectral data) [131], but other studies presented that it was worse by 13% than the Siamese neural network using aerial image [132] and 18% worse than CNN with a WorldView-3 image [133]. However, all our gained results were better than those reported using DL techniques considering OA, PA, UA or F1; thus, a good reference dataset and having the confounding variables can provide accurate solutions. We emphasize that simple orthophotos are not sufficient, but indices derived from the available raw image improved the accuracy by 20%. A disadvantage of DL methods relies on its requirement of a large training dataset [134][135][136], which is a limitation to apply them on segmented images.
When numerous variables are involved in a classification, the issue of overfitting arises. Overfitting can evolve ambiguous effects: (i) results will be very accurate, but only true with the training dataset (i.e., cannot be repeated with independent data), and users will be misled by the too proper accuracy parameters while the prediction indeed will still be inaccurate [137,138]; (ii) models can be biased by the unnecessary variables, having low contribution to the model, which probably act like noise and deteriorate the model fit. Deng et al. [139] called the attention of overfitting in case of PLS while robust non-linear models may handle the large number of variables well [140]. According to Breiman [96] RF handles overfitting with the test dataset if hyperparameters are fine-tuned (as in our case with minimizing the prediction error with k-fold cross-validation as it was suggested by Viswanathan and Viswanathan [141]. In case of SVM, Han and Jiang [142] performed a thorough analysis with different kernels, and they found that Gaussian kernels can encounter overfitting issues due to C parameter, which controls the misclassification (i.e., if the C parameter is large, a smaller margin hyperplane tends to minimize misclassifications); thus, the forced accuracy raises the chance of overfitting, too. MARS was studied by Khuntia et al. [143], and they found that due to its two-step prediction process, and especially the second step of backward variable selection (pruning) with removing the variables having the least contribution helps to handle the overfit. To sum it up, besides the potential risk of overfitting that increases when we apply many input variables, all the above cited studies agreed on the relevance of RKCV (or simple KCV) which is an important element for fine tuning ML algorithms in order to find the best hyperparameters, and also for minimizing the overfitting. We applied RKCV (for parameter tuning and for testing) and found that testing with independent samples can confirm that RF, SVM, and MARS algorithms were not biased by overfitting, otherwise the OAs and class level metrics would have had considerably weaker performance.
Class level classification performance focusing only on buildings brought similar results to OA values. Texture information had the lowest and involved all variables or the RFE-selected variables had the largest UA and PA values. On class level, we also preferred fewer variables to avoid overfitting; thus, we suggest using the six variables of RFE as input for building extraction. However, if morphometric indices cannot be calculated in the lack the raw aerial images, we suggest the combination of RGB bands with the RGB indices and the 4 bit texture information variables. F1 and IoU as accuracy measures aggregate the results of PA and UA and showed the same models as best and worst ones. However, it makes sense to determine both (in other cases the result can be different) and while F1 weights the true positive hits and provides a result closer to average, IoU's results are more pessimistic and are closer to worst performances. Considering the smallest differences, we can find the best models, and we also can select the ones which required the fewest number of input data (RFE-6) having the least chance for overfitting. Although both metrics are sensitive to imbalanced data [144], in our reference dataset all land cover classes had 200 elements.
The application of the RFE-10 variables on the ISPRS dataset highlighted that these indices can be efficient, but the accuracy will be lower. Although the method was able to identify all buildings, the area-based completeness showed that the optical method had its limits. In the study area of Toronto, in the downtown, green areas were missing and the appearance of buildings was different from the suburban zone of Debrecen: box-like buildings with flat roofs in Toronto and the only difference from the roads was the height. In Debrecen roofs were mainly hip roofs and dormer roofs where slope, aspect, and texture definitely discriminated from flat pavements and roads. However, in the rank of MDA, nDSM was important in the identification of the highest buildings and roads, but run percentage was also important. Furthermore, due to the high buildings, shadows were the largest bias on the accurate classification and smaller buildings (or parts of them) were not detected. Considering that other studies [145] with better efficiency used the advantages of the ALS data, and accepting that our visible range optical image processing method has limits (flat roofs and shadows can decrease the detection accuracy), the gained 88.7 completeness can be an acceptable alternative in urban sprawl analysis in lack of better (near infrared band, ALS point cloud) type input data.

Conclusions
Building extraction in urban environment is a key element of change detection. Our aim was to reveal the possibilities of land cover classification with special focus on building identification using only the orthoimage of visual bands and the photogrammetric point cloud determined from an aerial survey. We had the following findings.

-
Classification performance using only one group of indices (i.e., RGB bands, texture, RGB indices or morphometric indices) varied in a wide range. Texture information was the weakest, worse when only RGB bands were used. Morphometric indices performed better on class level than on overall because DSM and its derivatives added valuable information especially in case of buildings. RGB indices had a relevant contribution in the improvement but on class level it was worse than the overall accuracy. -Combination of different group of indices ensured higher accuracy both on overall and class level. Best option is to use the morphometric indices with the RGB bands, it had >90% OA, PA, and UA. -Combining three types of indices provided the most efficient models, having >95% OA, PA, and UA. The RGB bands, RGB indices, morphometric indices and the 4 bit texture information had the largest (100% UA and 98% PA). In addition, 4 bit and 8 bit texture information had small differences in these combinations, and the most important to avoid their common application (both versions decrease the accuracy). -Model evaluation should contain the UA and PA values, and having several model solutions, visualization of these metrics helps to find the trade-offs between omission and commission errors. In addition, F1 an IoU can express it with a single value which helps to create ranks of accuracy. -RFE as variable selection method provided an importance rank, and both the six and ten variable sets were efficient, providing the same accuracy as including all variables (100% UA and 98% PA). We suggest using the fewest number of variables to avoid overfitting. However, our most important variables (nDSM, RGBVI, GLI, blue band from RGB, slope, VARI) can be different in other study areas, so the methodology and the careful and customized variable selection is more important. -Efficiency of this approach can be limited in areas where high buildings have large shadows and building roofs are flat. While shadows bias the spectral profiles, flat roofs will be identical with roads, pavements, and parking lots; thus, slope and aspect cannot discriminate buildings.
Results confirmed that archive images can provide appropriate data for urban sprawl monitoring focusing directly on the buildings.