Capability of Remote Sensing Images to Distinguish the Urban Surface Materials: A Case Study of Venice City

: Many countries share an effort to understand the impact of growing urban areas on the environment. Spatial, spectral, and temporal resolutions of remote sensing images offer unique access to this information. Nevertheless, their use is limited because urban surface materials exhibit a great diversity of types and are not well spatially and spectrally distinguishable. This work aims to quantify the effect of these spatial and spectral characteristics of urban surface materials on their retrieval from images. To avoid other sources of error, synthetic images of the historical center of Venice were analyzed. A hyperspectral library, which characterizes the main materials of Venice city and knowledge of the city, allowed to create a starting image at a spatial resolution of 30 cm and spectral resolution of 3 nm and with a spectral range of 365–2500 nm, which was spatially and spectrally resampled to match the characteristics of most remote sensing sensors. Linear spectral mixture analysis was applied to every resampled image to evaluate and compare their capabilities to distinguish urban surface materials. In short, the capability depends mainly on spatial resolution, secondarily on spectral range and mixed pixel percentage, and lastly on spectral resolution; impervious surfaces are more distinguishable than pervious surfaces. This analysis of capability behavior is very important to select more suitable remote sensing images and/or to decide the complementarity use of different data.


Introduction
The world's population that resides in urban areas is growing rapidly: the urban population was equal to 751 million (about 30%) in 1950 and equal to 4.2 billion (about 55%) in 2018 [1]. United Nations predicted that this movement will concern another 2.5 billion people (about 68%) in 2050 and that the largest urban growth will take place in Africa and Asia [1]. The urbanization process is very complex as it is "due to a combination of four forces: natural growth, rural to urban migration, massive migration due to extreme events, and redefinitions of administrative boundaries" [2]. The rapid urban development has made clear to the scientific and policy-making community that not only do cities play a key role in social, economic, and environmental systems [1][2][3] but also their rapid growth impacts negatively on land and aquatic ecosystems, the climate, and the territory [1,[4][5][6][7][8]. International institutions and local governments have been devoting a number of efforts to understand and mitigate this impact. For example, National Development and Reform Committee (NDRC) in China introduced "the Low-Carbon City" initiative [9], the European Union (EU) established "Thematic Strategy on the Urban Environment" in 2005 [10], the government of Mexico City launched a 15-year program called "Mexico City Green Plan" in 2007 [11], and the United Nations promoted "United Nations Environment Programme" (UNEP) [12].
Remote sensing images offer unique access to information about the status of the land surfaces due to their spatial, spectral, and temporal resolutions. Therefore, these data could play an essential role in the characterization of the urban areas and in the monitoring of their expansion. Despite their observation capacities, Seto et al. [13] retrieved only 326 studies that mapped urban land cover with airborne and satellite data. They analyzed the English language literature which is contained in the ISI Web of Science database and was published between 1988 and 2008.
Since the impervious surface cover is the major component of urban land cover, it is regarded as an indicator of urban land cover, and it is used [38] to monitor the extension of the urban areas at country and global scales [17]. These studies have focused on spatially and spectrally distinguishing urban areas from the background and were performed with sensors characterized by coarse spatial resolution and high revisit time. For example, the optical remote sensing sensors used for this purpose (i.e., urban cover mapping at country and global scale in Table 1) are DMSP-OLS (Defense Meteorological Satellite Program's Operational Line-scan System) (e.g., [39]), MERIS (MEdium Resolution Imaging Spectrometer) (e.g., [37]), and MODIS (Moderate Resolution Imaging Spectroradiometer) (e.g., the authors of [36,40,41] retrieved impervious surface land cover from MODIS data at spatial resolutions of 250 m, 500 m, and 1 km, Table 1).
In conclusion, urban land cover mapping was performed at urban scale with hyperspectral, multispectral, and panchromatic optical remote sensing images at high and moderate spatial resolutions (e.g., ), and it was performed at country and global scale with multispectral optical remote sensing images at coarse spatial resolution (Table 1) (e.g., [36,37,[39][40][41]). Despite this, the map (or image) can never be the territory that encloses [42,43]. As a matter of fact, the authors underlined that every urban land cover class has a large number of pixels that are not distinguishable from others other classes [39][40][41]. In order to explain the large number of errors in the classification, Small [17] defined urban areas as a complex mosaic of many different materials, which have different sizes and shapes. Therefore, for leading a precise identification of urban surface materials, some studies merged different remote sensing data and auxiliary information [17][18][19][20][21][22][23][24][25][26][27][28][29][31][32][33][34][35][36][37][38][39][40][41] and combined different techniques [20][21][22][23][24]27,28,[31][32][33][34][35][36][37][38][39][40][41]. Some papers compared a few remote sensing images and/or products to evaluate thematic, spectral, and spatial capabilities to retrieve urban land cover from remote sensing data (e.g., [31,35,44]). Cavalli et al. [31] compared hyperspectral and multispectral remote sensing images with high and moderate spatial resolutions to evaluate the spectral capability to retrieve urban land cover. Tran et al. [35] highlighted that the optimal spatial resolution for urban remote sensing detection is closely related to three levels of urban observations (i.e., urban object, urban district, and urban spot). Potere et al. [44] examined the accuracy of some global urban maps identifying only two classes (i.e., urban and non-urban areas) and highlighted that these maps show great differences in urban sizes. Since these evaluations compared only a few sensors, they offered a partial view of thematic, spectral, and spatial capabilities to retrieve the urban surface materials from remote sensing data. Furthermore, the literature review suggests that fewer research efforts have been devoted to spectral requirements for urban mapping than spatial resolution requirements [45].
In conclusion, the literature review highlights the importance of having a complete view of the spectral and spatial capabilities to improve the accuracy of urban land cover from remote sensing data. Therefore, this work aims to examine the behavior of the capabilities of most available remote sensing sensors to distinguish the urban surface materials. In order to focus only on the spectral and spatial capabilities of most remote sensing sensors and avoid errors due to their other sources, a synthetic hyperspectral image of Venice historical center at high spatial resolution was created as a started image, its spatial and spectral characteristics were modified to match those of most remote sensing sensors, and their capabilities to distinguish the urban surface materials were evaluated and compared.

Study Area
The city of Venice is worldwide known as a relevant artistic and historical center. It is located inside a large lagoon in the northwestern Adriatic Sea which is named after the city of Venice. It is built on about a hundred small islands, where buildings arise one close to the other and separated by narrow streets (called "calli"), squares with different shapes (called "campi", only the square facing the Basilica of San Marco is called "piazza"), streams, and canals. The Grand Canal and the Giudecca Canal are the main canals: the first crosses the historical center drawing a wavy line, and the second, which is wider and deeper than the Grand Canal, is also a transit route for large tourist and container ships. However, last March, the Italian Council of Ministers decided that these large ships must be placed far from the historical center as requested by UNESCO.
The connection between the islands is guaranteed by several bridges; the majority of them are made of stone and others, of wood and metal. A bridge (i.e., Ponte della Libertà) connects the mainland to the western part of the city, which is the only portion accessible by wheeled vehicles. In the rest of the city, the movement of people or things is by pedestrian way or by boats. Most buildings are characterized by structures (called "altane") located on roofs and made almost entirely of wood. From the 15th century onwards, these structures were built to enlarge the living space or to provide a sunny and airy area in an urban mosaic with very narrow streets and squares.
The historical center of Venice is characterized by a mixture of urban surface materials, which have different sizes and shapes. For this reason, Venice was chosen as test site for evaluating the capability of most remote sensing images to distinguish the urban surface materials ( Figure 1).

Hyperspectral Library
A careful analysis of the literature and the knowledge of the city acquired during extensive field campaigns, which were conducted from June to September 2001, in June 2004, and May 2005, led to the identification of thirteen classes of materials that well summarize the land cover materials of the historical center of Venice (Table 2 [31,46]). During these campaigns, many images were acquired by panchromatic, multispectral, and hyperspectral sensors characterized by different spatial resolutions [31,46,47]. Lead Lead tiles were used as covering material for the public buildings and domes. 4 Concrete Concrete was primarily used in the western side of the city and in the harbor areas. 5 Trachyte Trachyte rock came from quarries of Euganean Hills (Italy) and was used for paving pedestrian streets. 6 Limestone Limestone rock came from quarries of Pietra d'Istria (Italy) and was used for decoration in the urban paving. 7 Asphalt Asphalt was primarily used in the western side of the city and in the harbor areas. 8 Pebbles 9 Sand 10 Wood Wood was used for paving some bridges, jetty, and swings. 11 Grass 12 Trees Water Water of the channels and streams.
The classes from 1 to 7 can be classified as impervious surface cover. The identified classes of urban surface materials were spectrally characterized not only in different sites of the historical center in order to sample the natural spectral variability but also in the laboratory in order to better assess the spectral features of the materials [31]. The field campaigns were conducted using a portable field spectrometer ASD (Analytical Spectral Devices) FieldSpec Full-Range Pro [31,46]. The ASD spectrometer samples spectral range from 350 to 2500 nm using two detectors: one covers from 350 to 1050 nm with a spectral sampling interval of 1.4 nm, and the other covers from 1000 to 2500 nm spectral sampling interval of 2.0 nm. The field ASD measurements were collected within 2 h of solar noon sets by acquiring 4-5 measures for each target from a height of 1 m using a field of view of 25 • to fulfill the target dimension. A National Institute of Standards and Technology calibrated panel (i.e., Spectralon reference standard) was employed to derive absolute reflectance spectra [31,46]. Therefore, the spectral library, which was included the accurate spectra and considered the variability of each class, was created ( Figure 2).
Since this work aims to focus attention only on the spatial and spectral capability of most remote sensing sensors to distinguish urban surface materials, a synthetic image of the historical center of Venice was created to avoid these errors, and its spectral and spatial characteristics were modified to match the characteristics of most remote sensing images. Errors due to the atmospheric and environmental effects were avoided by assembling the reflectance spectra acquired in situ. Errors due to the use of an incomplete spectral library were avoided by identifying the main surface classes present in the historical center of Venice and using only these spectra to create the synthetic image. Other authors also created synthetic remote sensing images in order to demonstrate the potential of data and models or to test algorithms and compare their results [48][49][50]. This simulation was possible due to the great knowledge of the city. IKONOS-2 sensor acquired the image of the historical center of Venice on April 2 2001, and this image, spatially resampled at 30 cm resolution, was chosen as the base image to manually digitize the areas where the identified surface classes were located. Only areas of the classes of grass and trees were identified by applying thresholds to Normalized Difference Vegetation Index, which was obtained from resampled IKONOS image. Therefore, each identified surface class was associated with a mask which represents the places where that class was located. All pixels included in each mask were used as ground truth in the validation of the classification results to avoid errors that arise when a small number of pixels are used to validate classification results. Figure 3 shows the 13 masks utilized to build the synthetic image.
The next step was to create an image with the same size as the resampled IKONOS image and with 624 bands (i.e., the number of bands that is required to cover the spectral range from 365 to 2500 nm with a spectral resolution of 3 nm). The identified masks were used to select the parts of this image that correspond to each class ( Figure A1a), and each part was multiplied by the spectrum of the corresponding class ( Figure 2). In order to simulate the internal spectral variability of the urban surface class, the mask of each class was associated with its mean spectrum and its variability: one yes and one no pixel of the mask was multiplied by the mean spectrum plus the standard deviation, and one no and one yes pixel of the mask was multiplied by the mean spectrum minus the standard deviation. In other words, imagining a chessboard of the same size as the image, its pixels that were associated with the black chessboard squares were multiplied by the mean spectrum plus the standard deviation ( Figure A1c), whereas those that were associated with the white chessboard squares were multiplied by the mean spectrum mine the standard deviation ( Figure A1d). These steps were repeated for each mask and the resulting images were assembled together to create the starting synthetic image ( Figure A1b). The previous papers analyzed most remote sensing images, which are characterized with different spectral ranges and with different spectral and spatial resolutions . Their characteristics were grouped into two spectral ranges, five spectral resolutions, and six spatial resolutions (Table 3). In order to retain spectral variability in all spatially resampled images, the spatial resolution of the starting synthetic image was chosen equal to 30 cm because the pixels of all spatially resampled images did not contain an integer and even number of pixels of the base synthetic image (Table 3). Therefore, the starting synthetic image of the historical center of Venice, which was characterized with the spectral range from 365 to 2500 nm and with the spectral and spatial resolutions equal to 3 nm and 30 cm, respectively, was resampled into 60 images (Table 3) in order to assess the capabilities of most remote sensing images to retrieve the urban surface materials.

Evaluation of Spectral and Spatial Capabilities Using Linear Spectral Mixture Analysis and FAMs
In order to address the problem of image pixels that contain the spectral information of more than one material, most works employed the spectral mixture analysis approach [17,20,21,23,24,27,31,38,41]. Among these classifiers, linear spectral mixture analysis (called LSM) assumes that the reflectance of a pixel is the sum of the reflectance of each material present therein and weighs their respective abundances [51]. The requirement for reducing the residual error and producing more accurate abundances is to know all endmembers which are present in each pixel [23,51]. Since every urban surface material of the starting image is known, LSM was performed. Since every urban mosaic is characterized by great diversity in the size and shape of the surface materials , the starting image was divided into 10 squares to analyze the variability of the different sizes and shapes of the surface materials. These squares were characterized by one km side for containing 4 pixels when the image was resampled to the lowest spatial resolution (i.e., 250 m).
Before applying the LSM classifier, the starting image divided into 10 squares was resampled according to the spectral ranges and the spectral and spatial resolutions that were summarized in Table 3. The spectral library, which included only the mean spectra of all identified materials, was spectrally resampled according to Table 3 before be utilizing as selected endmember spectra. LSM result consists of the fractional abundance images, whose number is equal to the number of spectra of the selected endmembers, and the pixels of these images represent the percentage of the class that is spectrally characterized by the endmember spectrum. LSM was applied to each spectral and spatial resampled image.
LSM results were evaluated using fractional abundance models (called FAMs) as ground truth because they represent the cumulative endmember abundance fractions as the percentage of pixels decreases. FAMs were created starting from three observations: (i) the resampled mask includes only pixels with 100% abundance of the corresponding class; (ii) the pixels of the starting mask not included in the resampled mask have an abundance of less than 100%; (iii) the abundance fractions of these pixels can be evaluated using the buffer zone of the resampled mask because their abundance fractions depend on the distance from the resampled mask.
The first step in the identification of these models was to calculate the percentage of pixels with the abundance equal to 100% using the starting masks: each mask was spatially resampled to 0.5, 1, 2, 2.5, 5, 10, 16, 20, and 50 m ( Figure A2a,b shows the mask of Old Tiles spatially resampled at 50 cm and 5 m, respectively); it was then resampled to 50 cm resolution to facilitate the comparison of all results as they had the same number of pixels; the pixels of the twice resampled mask were counted and compared with the total number of pixels of the starting mask since their percentage ratio is equal to the percentage of pixels with the abundance equal to 100% of the corresponding class ( Figure A2c shows in red Old Tiles mask resampled twice, first at 5 m and then at 0.50 m, which is superimposed on the Old Tiles mask at 0.5 m colored in white). In other words, the masks were resampled twice because the first time, the masks were resampled to identify the pixels with the abundance equal to 100% ( Figure A2b) and the second time, they were resampled at 50 cm to calculate the percentage of pixels with the abundances of 100% ( Figure A2c). The spatial resolution of 50 cm was chosen because the twice resampled mask contains an integer number of pixels of mask, which was resampled to 0.5, 1, 2, 2.5, 5, 10, 16, 20, and 50 m.
The second step was to compute the percentage of pixels with abundances more than 75% by creating a buffer zone of the twice resampled mask within the starting mask. This buffer zone had a pitch equal to a quarter of the spatial resolution analyzed, and the number of its pixels was counted and compared with the total number of pixels of the starting mask: their percentage ratio is equal to the pixel percentage with the abundance greater than 75% of the corresponding class.
In the same way, the third step was to calculate the percentage of pixels with abundances more than 50% by creating another buffer zone of the twice resampled mask within the started mask ( Figure A2d shows in blue the buffer zone of the twice resampled Old Tiles mask overlaid on the masks in Figure A2c. This buffer zone is identified by the distance less than and equal to 2.5 m from the twice resampled mask). This other buffer zone had a pitch equal to half of the analyzed spatial resolution, and the number of its pixels was counted and compared with the total number of pixels of the starting mask: their percentage ratio is equal to the pixel percentage with the abundance greater than 50% of the corresponding class.
Subsequently, the pixel percentages with abundance equal to 100%, with abundance greater than 75%, and with abundance greater than 50% were plotted versus the spatial resolutions in order to compute the best fitting functions and to assign pixel percentages with these abundances to spatial resolutions of most remote sensing images ( Table 3).
The masks were resampled at many spatial resolutions (i.e., 0.5, 1, 2, 2.5, 5, 10, 16, 20, and 50 m) to better calculate the best fitting functions that were required to assign percentages of pixels with the abundances of 100% and more than 75% and 50% to the spatial resolutions of most remote sensing images. All the best fitting functions calculated were linear. Figure 4 shows-as an example-the scatterplots of Water, Old Tiles, and Trachyte classes which were obtained using the mean values for all zones. Tables A1 and A2 summarize the features of best-fitting functions (i.e., the mean and the standard deviation values of the slopes, the intercepts, and the linear regressions) of the pixel percentages having 100% abundances and having abundances more than 50%, respectively.
In the last step in FAM implementation, the pixel percentages with abundances equal to 100%, more than 75%, and 50% identified for all selected spatial resolutions (Table 3) allowed us to calculate the distributions of the cumulative abundance fractions of each endmember versus the percentage of pixels (i.e., FAMs). FAMs were computed for each class in each zone at each spatial resolution (i.e., every class was characterized with 60 models). Figure 4 shows the averages of FAMs, which were computed for masks of Water, Old Tiles, and Trachyte resampled to 10 m (i.e., the green lines in Figure 4a-c, respectively).
Since the pure pixel is characterized by 100% abundance of only one endmember, the pixel that contains abundance fractions from 1 to 100% of more than one endmember is called a mixed pixel. Averages of the percentages of mixed pixels (i.e., the percentages of pixels with less than 99% abundance) of all surface classes and surfaces were calculated for images at all spatial resolutions analyzed using averages of FAMs (Table A3).
LSM results were evaluated by analyzing only the pixels of each fractional abundance image within the starting mask of the corresponding endmember. The abundance fractions of these pixels allowed us to calculate the cumulative abundance fractions of each endmember versus the percentage of pixels. Therefore, evaluation of LSM results was carried out by four values which compare these cumulative endmember abundance fractions and the corresponding FAMs and evaluate their differences: Kling-Gupta efficiency (KGE) and the sum of differences between all abundances (Totals Errors), the sum of differences between abundances from 100 to 50% (100-50% Errors), and the sum of differences between abundances less than 49% (49-0% Errors) in absolute values. This distinction of abundance percentages was performed since it divides pixels that are assigned to an endmember from those that are not assigned [25,51]. The KGE value was used to evaluate the similarity between the classified and modeled endmember abundance fractions according to pixel percentage. The KGE value the similarity using their linear correlation (R), standard deviation (σ), and mean (µ) values of modeled and classified data [52]. The KGE value is used to calibrate and evaluate not only the hydrological models but also the bio-optical models [53]. The literature grouped the KGE value into three classes: values equal to 1, positive and negative values indicating equal, approximately similar, and non-comparable distributions [52].

LSM Results
Since the starting image was spatially and spectrally resampled into 60 images ( Table 3) that were then divided into 10 zones, the LSM classifier was applied to 600 images, and their KGE, Total Errors, 100-50% Errors, and 49-0% Errors values were calculated.
In 10 zones of the starting image, the pixel percentages of Old Tiles, New Tiles, Lead, Concrete, Trachyte, Limestone, Asphalt, Pebbles, Sand, Wood, Grass, Trees, and Water classes ranged from 52 to 7%, from 2 to 0%, from 1 to 0%, from 6 to 0%, from 13 to 3%, from 2 to 0%, from 17 to 0%, from 4 to 0%, from 3 to 0%, from 5 to 0%, from 8 to 2%, from 4 to 1%, and from 40 to 10%, respectively. The pixel percentages of the materials classified as impervious, pervious, and vegetated surfaces ranged from 72 to 20%, from 13 to 2%, and from 12 to 3%, respectively, and their average percentages are 51, 6, and 8%, respectively. The mean percentages of the Water class are 23%. Since the pixel percentages of the impervious surfaces were higher than those of the other surface materials, the analysis of the results regarding these surfaces materials was performed first on each endmember using the average of the values for all zones and then overall using their weighted averages, while the analysis of the results regarding materials classified as pervious and vegetated surfaces was performed using their weighted averages only. KGE, Total Errors, 100-50% Errors, and 49-0% Errors values were assembled into arrays with six columns and five rows representing the selected spatial and spectral resolutions. Figure 5 shows the arrays made to assess and compare classification results of Old Tiles, New Tiles, Lead, Concrete, Trachyte, Limestone, and Asphalt endmembers (Figure 5a-f, respectively). In Figure 5, the matrices on the left show the values obtained from the images with the spectral range from 365 to 2500 nm, the matrices in the middle show the values obtained from the images with the spectral range from 400 to 1100 nm, and the matrices on the right show the difference between the values of the first and second matrices.      (Figure 6a-d, respectively). In Figure 6, the matrices on the left show the values obtained from the images with the spectral range of 365-2500 nm, the matrices in the middle show the values obtained from the images with the spectral range of 400-1100 nm, and the matrices on the right show the difference between the values of the first and second matrices.
Since Total errors, 100-50% Errors, and 49-0% Errors are given in absolute values, the negative errors (i.e., classified abundance fractions were greater than FAMs) were counted, and when their number is greater than that of the positive errors, their values were colored in black in Figures 5 and 6 and vice versa, their values were colored in red. Similarly, the differences between the values calculated from images with the spectral range of 365-2500 nm and the values calculated from images with the spectral range of 400-1100 nm are colored blue when the former values are greater than the latter, and conversely, the differences are colored green when the latter values are greater than the former.   Since the percentage of pixels in the "Old Tiles" class was greater than those of the other urban surface materials that were classified as impervious surfaces, the classification results of Old Tiles endmember (Figure 5a) are comparable with the classification results of the impervious surfaces (Figure 6a).

FAMs Calculated to Validate LSM Results
The common effort of works, which analyzed the urban surface materials from remote sensing images, is to improve thematic accuracy by searching for the most appropriate classifier and by carefully validating the results 44]. The majority of these studies employed the spectral mixture analysis approach [17,20,21,23,24,27,31,38,41], and the literature highlights two forms of ground truth available to assess its thematic accuracy: a spectral library and a set of areas representing pixel-by-pixel abundances of image endmembers [17,20,21,23,24,27,31,38,41,54]. Since the spectral library was employed to simulate the starting image, the thematic accuracy of classified urban surface materials was assessed using FAMs as ground truths since they provide abundance fractions of all pixels of corresponding endmembers. Sixty FAMs for each endmember were exploited not only to validate the results of each image but also to assess how different sizes and shapes of endmembers weighed on the classification results. Figure 4 clearly shows how FAMs change as the size and shape of endmembers change: the size and shape of Water endmember are different from those of Old Tiles endmember, and the size and shape of the latter are different from those of Trachyte endmember. Therefore, FAMs allow us to classify the image endmembers according to their sizes and shapes. The features of the best-fitting functions arranged in descending order highlighted two groups of endmembers: the first group includes Water, Asphalt, Lead, Concrete, Old Tiles, and New Tiles endmembers and the second group includes Grass, Trees, Trachyte, Sand, Pebbles, Wood, and Limestone endmembers. In the first group, the average of the slopes of the best-fitting functions and the average of their standard deviations range between −1.13% and −4.34% and between 0.4% and 1.55%, respectively; in the second group, the values range between −7.71% and −20.51% and between 4.17% and 6.43%, respectively. The difference between these groups is very marked because this ranking divides the materials that can be mapped with great non-jagged shapes from those that can be mapped with small, jagged shapes.
With reference to the impervious surfaces, spectral resolution and spectral range requirements were met by KGE, Total Errors, 100-50% Errors, and 49-0% Errors values (Figure 6a) except for the error values obtained from the images at the spatial resolutions of 50, 100, and 250 m, which did not meet the spectral range requirement. In other words, the amount of error in classifying images with a high percentage of mixed pixels depends poorly on decreasing the spectral range and spectral resolution of the images. The spatial resolution requirement was not met by the KGE values: the images with an average percentage of mixed pixels about of 44% (Table A3) show the greatest KGE values; the values decrease very slightly as mixed pixel percentage decreases; they decrease rapidly as mixed pixel percentage increases except for the mages at spectral resolutions of 3 and 10 nm with a spatial resolution of 250 m. These images show the KGE values lower than those obtained from images with a spatial resolution of 100 m. The spatial resolution requirement was met by the Total Errors values obtaining from the images with the spectral range of 365-2500 nm, while it was not met by the Total Errors values obtaining from the images with the spectral range of 400-1100 nm. They show the same behavior as the KGE values.
It is important to underline that the images at the spectral resolution of 100 nm and at the spatial resolutions of 1, 5, and 10 m with a spectral range of 400-1100 nm show greater Total Errors and 100-50% Errors values than those obtained from all images at spatial resolutions of 50, 100, and 250 m. In other words, multispectral sensors with high spatial resolution in the spectral range between 400 and 1100 nm have a lower capability to distinguish the urban surface materials than hyperspectral sensors with low spatial resolution in the same spectral range.
The analysis of the fractional abundance images of impervious surfaces is very important for urban land cover mapping not only since their pixel percentage was greater than those of other urban surface materials but also since these surface materials have a more negative impact on the environment than other urban materials [14][15][16][17].
With reference to the pervious surfaces, the spectral resolution and spectral range requirements were met by the KGE, Total Errors, 100-50% Errors, and 49-0% Errors values (Figure 6b). The spatial resolution requirement was not met by the KGE values: the greatest values were obtained from the images at the spatial resolution of 5 m (their mean percentage of mixed pixels equal to 47%, Table A3), and their values decrease slightly as spatial resolution increases, while they decrease rapidly as spatial resolution decreases except for spatial resolution of 250 m. The spatial resolution requirement was met by Total Errors values obtaining from the images with both spectral ranges.
Each requirement is not met by the KGE values of vegetated surfaces; the images at a spatial resolution of 5 m (their mean percentage of mixed pixels of these images is equal to 45%, Table A3) exhibit the greatest KGE values. The spatial resolution and spectral range requirements were partially met by Total Errors, 100-50% Errors, and 49-0% Errors values. In other words, the vegetation signatures are distinguishable from those of other urban surface materials regardless of the spectral range and the spectral resolution of the images (Figure 2 [27,31,58,59]).
The spatial and spectral resolution requirements are not met by the KGE and Total Errors values of the Water endmember (i.e., the endmember with the smallest values of KGE). The images at spatial resolution of 10 m (their mean percentage of mixed pixels is equal to 43%, Table A3) with the spectral range of 365-2550 nm exhibit the greatest KGE values and the smallest Total Errors values, while the images at a spatial resolution of 5 m with the spectral range of 400-1100 nm exhibit the highest KGE values (Figure 6d). The total Errors values obtaining from the images with spectral range of 400-1100 point out an increase as spatial resolution decreases except for the values obtained from the image at spatial resolution of 250 m, which are smaller than the values obtained from the image at spatial resolution of 100 m. Spectral range requirement was met by values of Water endmember. Since the water column signatures are characterized by very low reflectance in the visible and by reflectance close to zero in the other spectral regions (Figure 2), they are not well distinguishable from those of other urban surface materials, especially when the images with the spectral range of 365-2500 nm are classified [60,61].

The Percentages of Mixed Pixels
On the one hand, the analysis of LSM classification results confirmed that the capability to retrieve the impervious and the pervious surfaces generally decreases with decreasing spectral resolution and spectral range of the images. On the other hand, the analysis shows that the capability does not decrease with decreasing spatial resolution because the capability of the images depends more on increasing their percentage of mixed pixels than on decreasing their spatial resolution. Overall, the capability depends more on crossing or not crossing the 50% threshold of mixed pixels. Analysis of the fractional abundance images showed how mixed pixel percentages affect the capability to classify the urban surface materials: the KGE values achieve their maximum values in the images with mixed pixel percentages about equal to 50% and decrease slightly as mixed pixel percentages decrease, whereas the values decrease rapidly as mixed pixel percentages increase; finally, the KGE values become nearly constant in the images with mixed pixel percentages equal to 100%.
The values of Total Errors, 100-50% Errors, and 49-0% Errors are also affected by the variation in the mixed pixel percentage: the errors values of some endmembers show similar behavior to the KGE values, and the errors values of other endmembers decrease or increase rapidly corresponding to images with the percentage of mixed pixels greater than 50%. It is important to note that the values of 100-50% Errors obtained from the images with the mean percentage of mixed pixels less than 50% were greater than the values of 49-0% Errors: this difference is greater in the images with the spectral range of 400-1100 nm than that in the images with the spectral range of 365-2500 nm; it is greater in the images at great spatial resolution than that in the images at small spatial resolution. Overall, 100-50% Errors values obtained from the images with the mixed pixel percentages less than 50% reveal an underestimation of abundances between 100 and 50% of the endmembers (the percentage errors colored in red in Figures 5 and 6), whereas 100-50% Errors values obtained from the images with the mixed pixel percentages more than 50% reveal an overestimation of abundances between 100 and 50% of the endmembers (the percentage errors colored in black in Figures 5 and 6).
As the spatial resolution decreases, the behavior of KGE and errors reveals a division of endmembers into two groups: the first shows a strong change in their behavior between the images at the spatial resolution smaller than and equal to 10 m and the images at the spatial resolution greater than 50 m; the second shows a strong change in their behavior between the images at the spatial resolution smaller than and equal to 5 m and the images at the spatial resolution greater than 10 m. The first group includes five sevenths of endmembers identified as impervious surfaces and Water endmember (their mixed pixel percentages in these images are about equal to 44%); the second group includes the remaining endmembers (their mixed pixel percentages in these images are about equal to 46%). These two groups appropriately correspond to the previously identified groups that were highlighted by examining the best-fitting functions: the first group includes the materials with large, non-jagged shapes, and the second includes the materials with small, jagged shapes. Therefore, the capability does not depend only on the shape of the endmembers, but it depends on the characteristic of the scene that associates their size to their shape [35].

Ranking of the Capabilities of Most Remote Sensing Images
The capability of remote sensing images to distinguish between the urban surface materials leads to assigning a pixel to the right class or a different class and depends on characteristics of the scene and image (i.e., the size and shape of the endmembers, the spatial and spectral resolution, and the spectral range of the remote sensing images). In order to assess how much each individual characteristic affects the capability of each urban surface material, the images with the same characteristic were grouped together, creating a set of groups consistent with that characteristic whose number is a function of that characteristic (i.e., two, six, five, and two sets of images grouped together according to 50% threshold of mixed pixels, the spatial and spectral resolutions, and the spectral range of the analyzed images, respectively). The KGE values of one set of images were averaged to compare them with those of every other set that was grouped by the same characteristic: the average KGE values of the group of the images with mixed pixel percentages below 50% were compared with the values of the images with mixed pixel percentages above 50%, whereas the average KGE values of the groups of images with different spatial or spectral resolution or different spectral range were compared with those of each other set that was grouped by spatial resolution, spectral resolution, and spectral range, respectively. Since the KGE values range from 1 to minus infinity, a unit was subtracted from all values and added once the average was calculated. Table 4 shows the average KGE values obtained from the images with the spectral range of 365-2500 nm and the mixed pixel percentages less than 50% and more than 50% and from the images with the spectral ranges of 400-1100 nm and the mixed pixel percentages less than 50% and more than 50%. Table 4. The average KGE values obtained from groups of the images at each spectral range with the mixed pixel percentages less than 50% and more than 50%.

400-1100 nm with Mixed Pixel Percentages
Less than 50% More than 50% Less than 50% More than 50% On the whole, these KGE values show the following list of the capabilities in descending order: vegetated, impervious, and pervious surfaces and Water endmember. The values obtained from the images with the spectral range of 400-1100 nm exhibit an exception: the pervious surfaces in the images with the mixed pixel percentages less than 50% show greater capabilities than impervious surfaces; the pervious surfaces in the images with mixed pixel percentages more than 50% show the smallest capabilities.
The differences between the capabilities of the image sets grouped by mixed pixel percentage show that the greatest loss of capability occurs in the retrieval of the pervious surfaces and Water endmember.
In the second column of Table 4, the average KGE values are very similar, varying between 0.54 and 0.23 (i.e., mean and standard deviation values are equal to 0.43 and 0.14, respectively). The average KGE value of impervious surfaces is greater than that of pervious surfaces, but the difference between them is very small (i.e., 0.04). In the third column, the average KGE values are very scattered, varying between 0.16 and −2.87 (i.e., mean and standard deviation values are equal to −1.14 and 1.46, respectively). The average KGE value of impervious surfaces is much greater than that of pervious surfaces (i.e., the difference between them is equal to 1.81). In the fourth column, the average KGE values are very similar, varying between 0.48 and 0.04 (i.e., mean and standard deviation values are equal to 0.28 and 0.19, respectively). The average KGE value of pervious surfaces is greater than that of impervious surfaces, but the difference between them is small (i.e., 0.12). In the fifth column, the average KGE values are very scattered, varying between −0.17 and −8.92 (i.e., mean and standard deviation values are equal to −3.41 and 4.09, respectively). The average KGE value of impervious surfaces is much greater than that of pervious surfaces (i.e., the difference between them is equal to 8.49).
In short, the KGE values of the images with the mixed pixel percentages below 50% are always positive and overall are not scattered, whereas the values of the images with the mixed pixel percentages above 50% are mostly negative (except for vegetated surfaces), even very great ones, and overall are very scattered. This difference in the capability behavior highlights that the capability of the former images to distinguish the endmembers depends primarily on their internal spectral variability, whereas the capability of the latter images to distinguish the endmembers depends primarily on the spectral characteristics of all endmembers and their variabilities. Moreover, this difference was detected in the images at all resolutions, thus also in the images with high spatial resolution and low spectral resolution, not confirming what was claimed by Small [17].
It is important to remember the characteristics of these two groups of surface materials: the first, which includes materials with large, non-jagged shapes, shows the mixed pixel percentage less than 50% in images at the spatial resolutions less than 10 m, and the second, which includes materials with small, jagged shapes, shows the mixed pixel percentage less than 50% in images at the spatial resolutions less than 5 m. Therefore, the image sets grouped by mixed pixel percentage and those grouped by spatial resolution are very similar but not the same. The average KGE values of impervious, pervious, and vegetated surfaces and Water endmember were calculated by averaging the values obtained from the images at all spectral resolutions for each spatial resolution (Table 5). On the whole, these KGE values calculated from the images with the spectral range of 365-2500 nm show the following list of the capabilities in descending order: vegetated, impervious, and pervious surfaces and Water endmember. The values of the vegetated surfaces and Water endmember obtained from the images at the spatial resolutions of 10 and 100 m are the only exceptions. The KGE values calculated from the images at the spatial resolutions of 1 and 5 m with the spectral range of 400-1100 nm show the following list of the capabilities in descending order: vegetated and pervious surfaces, Water endmember, and impervious surfaces. On the whole, the KGE values calculated from the images at the spatial resolutions of 10, 50, 100, and 250 m with the spectral range of 400-1100 nm show the following list of the capabilities in descending order: vegetated, impervious and pervious surfaces, and Water endmember.
In this case, the capability loss was calculated by subtracting the capability of the images grouped by greater spatial resolution from the capability of the images grouped by immediately smaller spatial resolution. The greatest loss of the capability occurs in the images between the spatial resolutions of 50 and 10 m for impervious surfaces, in the images between the spatial resolutions of 100 and 50 m for pervious and vegetated surfaces and the Water endmember except for the capability of the vegetated surfaces, which was obtained from the images with the spectral range 400-1100 nm. In these images, the greatest loss of capability occurs in the images between the spatial resolutions of 100 and 250 m.
These classifications confirm those based on mixed pixel percentages except for the average values obtained from the images at the resolution of 10 m, where only the mixed pixel percentages of impervious surfaces and Water endmember are less under 50%. Moreover, the average values calculated from the images at the spatial resolutions of 1 and 5 m (where all endmembers exhibit mixed pixel percentages less than 50%) show a very small standard deviation (i.e., 0.17), whereas the average values calculated from the images at the spatial resolutions of 50 and 100 m show great standard deviation that increases as spatial resolution decreases (i.e., 0.69 and 8.95, respectively). The average values calculated from the images at the spatial resolution of 250 m show a smaller standard deviation than that calculated from the images at the spatial resolution of 100 m and greater than that calculated from the images at the spatial resolution of 50 m (i.e., 2.18).
In order to rank the capability from the image sets grouped by each spectral resolution, the mean values of KGE were calculated by averaging the values obtained from the images at all spatial resolutions for each spectral resolution (Table 6). The vegetated surfaces show the best capability to be distinguished from remote sensing images except for high spectral resolution images where the impervious surfaces show the best capability. On the whole, the impervious surfaces show the second-best capability, followed by those of Water endmember and pervious surfaces. The average values obtained from the images with the spectral range of 365-2500 nm highlight that the vegetated surfaces show slightly greater values than those of impervious surfaces, which are much greater than the values of Water endmember and pervious surfaces. The average values obtained from the images with the spectral range of 400-1100 nm highlight that the differences between the average values of each surface material are much great, and the values of pervious surfaces are smaller than −5.
The capability loss was calculated by subtracting the capability of the images grouped by greater spectral resolution from the capability of the images grouped by immediately smaller spectral resolution. The greatest capability loss occurs in the images with the spectral range of 365-2500 nm between the spectral resolutions of 10 and 3 nm for impervious and pervious surfaces, between the spectral resolutions of 50 and 30 nm for vegetated surfaces, and between the spectral resolutions of 100 and 50 nm for Water endmember. On the other hand, the greatest loss of capability occurs in the images with the spectral range of 400-1000 nm between the spectral resolutions of 50 and 30 nm for impervious surfaces and Water endmember, between the spectral resolutions of 100 and 50 nm for pervious surfaces, and between the spectral resolutions of 50 and 100 nm for vegetated surfaces. The value of vegetated surfaces obtained from the images at the spectral resolution of 100 nm is greater than that obtained from the images at the spectral resolution of 50 nm. This is also noticeable in the values of the impervious surfaces obtained from the images with the spectral range of 365-2500 nm, whereas the value of pervious surfaces obtained from the images at the spectral resolution of 250 nm is greater than that obtained from the images at the spectral resolution of 100 nm.
In order to rank the capability from image sets grouped by each spectral range, the mean values of KGE were calculated by averaging the values obtained from all images for each spectral range (Table 7). On the whole, these KGE values show the following list of the capabilities in descending order: vegetated, impervious and pervious surfaces, and Water endmember. The values obtained from the images with the spectra range of 400-1100 nm show an exception: the pervious surfaces exhibit the smallest capability.
In short, the capabilities to distinguish the urban surface materials obtained from the images with different percentages of mixed pixels, those obtained from the images at different spatial or spectral resolutions, and those obtained from the images with different spatial ranges show the same ranking: vegetated, impervious and pervious surfaces, and Water endmember. In other words, the capability to map pervious surfaces from remote sensing images is greatly affected by scene and sensor characteristics, whereas the capability to map vegetated surfaces and secondarily impervious surfaces is less affected by scene and sensor characteristics.
This analysis provided an estimate of the capability of each image set, but it did not assess how much the increase of mixed pixel percentage impacted on the capability to distinguish the urban surface materials with respect to the decrease of the spatial and spectral resolutions and to the decrease of the spectral range of the remote sensing image. For this purpose, the capability losses due to the increase in the mixed pixel percentage and the decrease in the spectral range and spatial and spectral resolutions were averaged and compared (Table 8). Overall, the capability losses depending on the scene and sensor characteristics show the following list in descending order: the capability loss due to the decrease in the spatial resolution, the capability loss due to the decrease in spectral range, the capability loss due to the increase in endmember size and shape, and the capability loss due to the decrease in spectral resolution. The loss of the capability to distinguish impervious surfaces shows an exception because the capabilities due to the decrease in spectral range and to the increase in the percentage of mixed pixels are greater than that due to the decrease in spatial resolution. The values of the capability loss highlight that not only the decrease in the spatial resolution weighs heavily on the capability to distinguish the urban surface materials but also the decrease in the spectral range and the increase in the percentage of mixed pixels, whereas the decrease in the spectral resolution weighs much less on the capability. Moreover, scene and sensor characteristics weigh more heavily on the distinction of pervious surfaces than on that of other urban surface materials.
The study of the urban environment from remote sensing data must be enhanced now, more than ever, by analyzing their capabilities to distinguish between the urban surface materials in order to take full advantage of available sensor features. To this end, it is initially planned to identify miss-assigned endmembers and analyze their capability to distinguish urban surface materials and then to experiment with other classifiers and compare their results.

Conclusions
The aim of this work was to assess and compare the capabilities of most remote sensing images to distinguish the urban surface materials. This comparison becomes now, more than ever, fundamental, considering that every remote sensing sensor is used to map urban surface materials [13,45] and that many countries are investing in the analysis of the impact of the urban area on the environment exploiting remote sensing data [18,22,[25][26][27]30,32,33,38,39,41].
The results show that the capability to distinguish the urban surface materials depends on the spatial and spectral resolution, the spectral range of the images, and the percentage of mixed pixels. In particular, the behavior of capabilities is significantly affected by exceeding, or not exceeding, the 50% mixed pixel percentage. Fractional abundance models show that only some impervious surfaces, which are characterized by large, non-jagged shapes, exceed this percentage threshold of mixed pixels in the images at 50 m spatial resolution, whereas the remaining impervious surfaces and pervious and vegetated surfaces, which are characterized by small, jagged shapes, exceed the threshold in the images at 10 m spatial resolution.
The overall analysis of all results shows that the decrease in the spatial resolution of remote sensing images causes the greatest loss of capability to distinguish the urban surface materials, followed by the decrease in the spectral range, the increase in the mixed pixel percentage, and lastly, by the decrease in the spectral resolution (the averages of capability losses are 2.13, 1.51, 1.21, and 0.21, respectively). Moreover, the data highlight that scene and sensor characteristics weigh more heavily on the retrieval of pervious surfaces than on the retrieval of other urban surface materials because the pervious surfaces highlight the greatest loss of capability due to not only the decrease in the spatial resolution of the images but also the decrease in the spectral range and the increase in the mixed pixel percentage (the capability losses are 6.85, 4.71, and 3.56, respectively). The capability to map the vegetated surfaces, and secondarily, that to map the impervious surfaces, is much less influenced by the scene and sensor characteristics than the capability to map the pervious surfaces (the averages of capability losses are 0.21, 0.23, and 3.83, respectively).
In conclusion, since every work that mapped the urban cover using remote imagery complained of a large number of errors due to poorly distinguishable urban surface materials 44,45], this assessment of capability behavior is a very useful tool for selecting the most suitable remote sensing image and/or deciding on the complementary use of other data and/or methods.  Figure A1. The steps performed to create the synthetic starting image: (a) the "Old Tiles" mask with a green square showing the detail highlighted in the (c,d); (b) the starting synthetic image at 665 nm; (c) the detail of the "Old Titles" mask multiplied by the mean spectrum plus the standard deviation; (d) the detail of the "Old Titles" mask multiplied by the mean spectrum mine the standard deviation. Figure A2. The steps performed to calculate the percentage of pixels with the abundance equal to 100% and more than 50%: (a) the "Old Tiles" mask at spatial resolution of 0,50 m; (b) the "Old Tiles" mask at spatial resolution of 5 m; (c) the mask "Old Tiles" resampled twice, first at 5 m and then at 0.50 m, colored in red superimposed on the mask "Old Tiles" at a spatial resolution of 0.5 m colored in white; (d) the buffer zone identified by the distance less and equal to the 2.5 m from the "Old Tiles" mask resampled, twice colored in blue superimposed on the "Old Tiles" masks of the (c).
The appendix also contains the features of the best-fit functions of the percentages of pixels that have 100% abundances and abundances greater than 50%: the information about two steps in FAM development that led to the division of urban surface materials into two groups.  In addition, the appendix lists the averages of mixed pixel percentages (i.e., pixels with less than 99% abundance) of all classes and surfaces that were computed for the images at all spatial resolutions by exploiting the FAMs (Table A3).