Inﬂuence of Spatial Resolution for Vegetation Indices’ Extraction Using Visible Bands from Unmanned Aerial Vehicles’ Orthomosaics Datasets

: The consolidation of unmanned aerial vehicle (UAV) photogrammetric techniques for campaigns with high and medium observation scales has triggered the development of new application areas. Most of these vehicles are equipped with common visible-band sensors capable of mapping areas of interest at various spatial resolutions. It is often necessary to identify vegetated areas for masking purposes during the postprocessing phase, excluding them for the digital elevation models (DEMs) generation or change detection purposes. However, vegetation can be extracted using sensors capable of capturing the near-infrared part of the spectrum, which cannot be recorded by visible (RGB) cameras. In this study, after reviewing different visible-band vegetation indices in various environments using different UAV technology, the inﬂuence of the spatial resolution of orthomosaics generated by photogrammetric processes in the vegetation extraction was examined. The triangular greenness index (TGI) index provided a high level of separability between vegetation and nonvegetation areas for all case studies in any spatial resolution. The efﬁciency of the indices remained fundamentally linked to the context of the scenario under investigation, and the correlation between spatial resolution and index incisiveness was found to be more complex than might be trivially assumed.


Introduction
The last decade has witnessed a rapid consolidation of photogrammetric techniques following the advancement of increasingly powerful structure from motion-multiview stereo (SfM-MVS) algorithms for feature matching between images [1,2]. At the core of this exponential growth has been the widespread and frequent use of unmanned aerial vehicles (UAVs) for high-and medium-scale observation campaigns [3]. Indeed, for these scales of investigation, UAVs platforms can provide higher spatial resolution products compared to traditional aerial or satellite observations [4]. Today, varieties of UAVs, characterised by different flight mechanics and take-off weights, and equipable sensors are continuously put on the market, providing a wide choice for operators in the sector even at minimal costs [5]. On the other hand, their versatility and transversality have triggered new application areas [6][7][8].
In most cases, these UAVs are equipped with inexpensive cameras, capable of acquiring images in the visible bands (RGB). Although they are not metric cameras, several studies had investigated their peculiarities, highlighting the possibility of obtaining results that are quite comparable to metric ones both in terms of geometric calibration of the lenses and photogrammetrically returnable products [9,10]. Recently, the integration of inertial measurements unit (IMU) and increasingly accurate global navigation satellite system (GNSS) receivers in vehicles, capable of real-time kinematic (RTK) or postprocessing kinematic (PPK) measurements, turn georeferencing strategies often independent of laborious and expensive field measurement operations [11,12]. Despite this, the scientific community was still engaged in finding optimised methodologies to mitigate the various uncertainties from lens distortions in the inertial measurement unit and the sensor's interior and exterior orientation parameters. However, several recent works attested to the validity of these products according to the scientific community's accuracy standards [13,14].
In various application areas, it is often advantageous to perform algorithms to extract vegetated areas and learn about their characteristics automatically [15,16]. In these contingencies, extraction can be easily achieved by using professional multiband sensors that include a band dedicated to the near-infrared (NIR) part of the spectrum (approximately between 760-900 nm), which commercial RGB cameras cannot capture. Furthermore, the use of these sophisticated but above all costly sensors compared to commercial RGB cameras makes operations unprofitable and constrained. The RGB cameras are often preferred among different sensors due to their low-cost availability, low power requirements, ease of use, and flexibility in implementation [17].
Therefore, this study concerns the investigation of vegetation indices (VIs) generated from visible bands taken from available and user-friendly sensors [18,19].
In this study, after studying the performances of several VIs in the visible in various environments using different UAV technologies, the impact of the spatial resolution in the visible-VI's vegetation extraction was evaluated. Indeed, as stated in Agapiou et al. [20] and Niederheiser et al. [21], spatial resolution was a key characteristic for vegetation mapping in remote sensing imagery in heterogeneous landscapes. In this regard, Räsänen et al. [22] compared multisensor and multiresolution products and analysed their vegetation mapping efficiency in terms of classification performance. As Kwan et al. [23] attested, the very high spatial resolution of UAV imagery often causes noise effects due to an increase in detectable targets, so it is essential to investigate the optimal resolution in each scenario in order to efficiently map vegetation.
The manuscript is organised as follows: Section 2 describes the areas surveyed, the acquired data, and the technologies used. A description of the methodologies adopted to process these data is presented in its subsections. The analysis of the results obtained is addressed in Section 3, with a discussion in Section 4. Finally, the conclusions report the findings and future proposals for investigation.

Related Works
UAV images and photogrammetric outcomes permit us to obtain a lot of precise measurements about vegetation in a fast and easy way, define any characteristics, extract it from the entire product, and manage it for other purposes [24]. For example, for the generation of digital elevation models (DEM), it is necessary to exclude vegetated areas through masking operations; in other cases, it is considered useful to monitor any temporal changes, as crop yield estimation, landcover land-use monitoring, urban growth monitoring, drought monitoring, etc. [6,25]. Several authors argue that the NIR part of the spectrum has been widely exploited in remote sensing applications [26,27], implementing numerous VIs. These indices are formulated based on different mathematical equations that can detect healthy vegetation, taking into account atmospheric effects and ground reflection noise [16,28]. One of the most well-known and widely used VIs is the so-called normalised difference vegetation index (NDVI). NDVI is calculated using the near-infrared and red-band reflectance values of multispectral images [25]. Although several VIs available for vegetation extraction, a challenge remains regarding selecting the most appropriate for specific applications. This, of course, depends mainly on the scenario under investigation [29].
On the other hand, other authors have proposed to investigate the advantages of using common sensors in the visible bands and then to evaluate their performance compared Remote Sens. 2021, 13, 3238 3 of 25 with previous sensors [30,31]. The need to structure pre-processing and postprocessing methodologies for geometric and radiometric contents in order to make these at least comparable with more sophisticated sensors has therefore emerged [32].
Consumer cameras often have the problem of not being radiometrically calibrated [4,33]. Indeed, to provide remote sensing data with a quantitative value, it is necessary to calibrate them both geometrically and radiometrically and then make an absolute atmospheric correction [34]. Precisely, the calibration allows the recovery of the existing relationship between the pairs of position and radiance on the ground and the coordinate and brightness of the image, respectively.
Above all, data in the visible range are influenced by sensor characteristics, illumination, geometry, and atmospheric conditions [35]. Sensor calibration is achieved using known gain and offset coefficients to convert digital numbers (DNs) into sensor radiance and then, after normalisation, into sensor reflectance.
Several methods took into account the effects of illumination and atmosphere on sensor radiance, including normalisation to a spectrally flat target or image average, radiative transfer models that simulate the interaction between radiation and the atmosphere, and surface empirical relationships between sensor radiance and ground reflectance [34]. Due to these calibration methods' technical limitations, there is a need to identify a feasible and cost-effective radiometric calibration method when processing images collected by commercial digital cameras using UAVs [4].
Low flight altitudes, generally below 150 m above ground result, due to regulatory restrictions on flight rules, in an increased number of collected images than those acquired from a satellite platform or piloted aircraft over the same area [36]. This leads to difficulties in performing in situ at-surface reflectance calibration measurements for all the images acquired by UAVs [37]. This requires the placement of numerous calibration targets in the field that homogeneously cover the area of interest. This results in longer timeframes of field activities and a significant effort in the field, considering the difficulties encountered in more impervious scenarios [38].
For this purpose, to perform a radiometric calibration of the generated orthomosaics, the variability of the results obtained from applying the empirical line method (ELM) [39] was compared with different spatial resolutions. Calibration validations were attested by comparing the extractable spectral signatures about targets in vegetated, asphalt, and bare soil areas with those found in the literature. This allowed us to assess the calibration process's accuracy and, therefore, the level of confidence in interpreting the derived products.

Acquired Datasets
For the needs of the current study, three different datasets were selected, based on the following criteria: (1) having a different context, (2) being captured by different UAV/camera sensors, (3) having a different georeferencing strategy, and (4) capturing by different altitude above ground level (AGL) and (5) different ground sample distance (GSD). These five connotations among the three datasets allow us to prove the versatility and non-specificity of the workflow proposed in the research. A preview of these areas can be found in Figure 1. Case study (a) was a construction site not far from the village centre of Fasoula in the Limassol district in Cyprus (Figure 1a), where vegetation was randomly scattered. An out-of-town environment in Grottole in the province of Matera (Italy) was selected as case study (b) (Figure 1b), where high vegetation, bare soil, and, above all, a viaduct were visible. An abandoned archaeological area, named Punta Penna because of the promontory on the sea where it stands, in Torre a Mare, the southernmost district of the city of Bari (Italy) was considered as the third case (c) (Figure 1c). It should be mentioned that water was visible in this dataset. Table 1 shows the characteristics and technologies used for each dataset. (Italy) was selected as case study (b) (Figure 1b), where high vegetation, bare soil, and, above all, a viaduct were visible. An abandoned archaeological area, named Punta Penna because of the promontory on the sea where it stands, in Torre a Mare, the southernmost district of the city of Bari (Italy) was considered as the third case (c) (Figure 1c). It should be mentioned that water was visible in this dataset.  Table 1 shows the characteristics and technologies used for each dataset. For the second dataset, a GNSS acquisition campaign of 11 ground control points (GCPs) was carried out, measured in network real-time kinematic (nRTK) mode with an average accuracy of 2 cm the three axes, in order to perform an indirect georeferencing of the photogrammetric products. For the rest of the case studies, a direct georeferencing was preferred, i.e., in the first case using the geo-tags of each image measured in RTK using the receiver on board the vehicle (average accuracy of about 10 cm), while in the other case using the same geo-tags but measured with a low-performance GNSS receiver (average accuracy of 3 m).

Photogrammetric Processing
The processing of the collected datasets was based on the workflow proposed in [14,40,41]. The different parameterisations for each dataset were detailed in this paragraph. Agisoft Metashape software (v.1.4.1, Agisoft LLC -St. Petersburg, Russia) was used during the work on Intel(R) Core (TM) i7-3970X CPU 3.50GHz hardware, with 16GB of RAM and an NVIDIA GeForce GTX 650 graphics card to return the photogrammetric products.
Three chunks were generated, and the workspace was adjusted, as shown in Table 2. The first step was to correctly set up the workspace and remove any blurred images, which could compromise the final results. After launching the estimate image quality tool (Agisoft Metashape), the results obtained with a quality value beyond a threshold equal  For the second dataset, a GNSS acquisition campaign of 11 ground control points (GCPs) was carried out, measured in network real-time kinematic (nRTK) mode with an average accuracy of 2 cm the three axes, in order to perform an indirect georeferencing of the photogrammetric products. For the rest of the case studies, a direct georeferencing was preferred, i.e., in the first case using the geo-tags of each image measured in RTK using the receiver on board the vehicle (average accuracy of about 10 cm), while in the other case using the same geo-tags but measured with a low-performance GNSS receiver (average accuracy of 3 m).

Photogrammetric Processing
The processing of the collected datasets was based on the workflow proposed in [14,40,41]. The different parameterisations for each dataset were detailed in this paragraph. Agisoft Metashape software (v.1.4.1, Agisoft LLC -St. Petersburg, Russia) was used during the work on Intel(R) Core (TM) i7-3970X CPU 3.50GHz hardware, with 16GB of RAM and an NVIDIA GeForce GTX 650 graphics card to return the photogrammetric products.
Three chunks were generated, and the workspace was adjusted, as shown in Table 2. The first step was to correctly set up the workspace and remove any blurred images, which could compromise the final results. After launching the estimate image quality tool (Agisoft Metashape), the results obtained with a quality value beyond a threshold equal to 0.7 were included in the processing chain. Briefly, this tool provides information about the sharpest borders detected on the image and can be used to find blurred images. For the third case study, it was necessary to manually fix the GPS/INS Offset value equal to (0.005 ± 0.005, 0.100 ± 0.01, 0.250 ± 0.01) m, concerning the lever-arm vector. In contrast, for the other cases, this was automatically computed and recorded in each geotag by the technology equipped on board each aircraft. No manipulations of the radiometric information were performed, varying illumination and contrast, so as not to compromise the original data.  The sparse point cloud reconstruction was started in the next step, initiating the camera alignment processes as indicated in Table 2. The point clouds obtained were subjected to filtering by indicating thresholds (Table 2), already validated in Saponaro et al. [42], regarding reconstruction uncertainty, projection accuracy, and reprojection error. This allowed the bundle block adjustment (BBA) algorithms, which ran in Optimize Cameras, to transfer initial corrections to the models.
According to the designed strategy (Table 1), a direct or indirect georeferencing of the models was performed, and a final BBA was performed to readjust the model. The primary source of error in georeferencing comes from executing the linear transformation matrix on the model [33]. The mitigation of potential nonlinear deformation components of the model and the minimisation of the sum of the reprojection error and the reference coordinate misalignment were performed by reinitiating the estimated point cloud optimisation camera parameters based on the known reference coordinates.
The point cloud densification algorithms were started as the last step, DEMs were calculated on these and image orthorectifications were generated based on the computed elevations. In general, orthorectification is transforming from a central projection of the original image to a parallel projection [34]. Consequently, displacement due to the tilt of the sensor and to the terrain relief was corrected. The blending mode has two options (mosaic and average) for the mosaicking step to select how pixel values from different (overlapping) images will be combined in the final texture layer. In this study, the selected mosaic options are shown in Table 2. Blending pixel values using mosaic mode does not mix image details of overlapping photos but uses only images where the pixel in question is located within the shortest distance from the image centre [33]. Four orthomosaics were exported for each scenario investigated: beginning from the highest resolution and then doubling, tripling, and quadrupling the resolution. Further down-sampling would not justify the use of UAV technology [43]. In particular, the Agisoft Metashape software made it possible to export the orthomosaics at the chosen resolution, resampling each time by bilinear interpolation.

Empirical Line Method
Prevailing environmental conditions highly influence UAV imagery at the time of data acquisition [10,18]: the atmospheric composition (e.g., water vapour and aerosols) and solar illumination patterns are the most impacting on the radiometric camera calibration.
Consequently, while images of the same scene acquired from the same sensor at different times may have different properties [33], images acquired from the same sensor and campaign may also contain noise due to lens distortions, systematic sensor errors, as well as variation in camera sensitivity across the same image [38].
Therefore, it is essential to carry out a radiometric calibration of the photogrammetrically returned orthomosaics to be considered quantitatively and qualitatively comparable.
The orthomosaics were imported into the open-source software QGIS (3.16.5 'Hannover') [44]. Following the procedures adopted in [29], high and low reflectance targets were manually identified as represented in Figure  (a-c) represent the three scenarios as proposed in Table 1. The round icon identifies the north orientation of the areas.
Consequently, we considered the adequacy of the target sites against the criteria proposed in [34]: (a) high spatial homogeneity, concerning the spatial resolution of the image dataset, i.e. ideally, each target should cover an area of about 5 × 5 pixels in the reference images; (b) representativeness of the dynamic range of the radiance in the region; (c) low adjacency effects of targets located at an adequate distance from other volumetric scattering disturbances; (d) low slope effects, i.e., targets with flat or Lambertian surfaces; (e) low temporal variability of the spectral response, i.e., targets with stable spectral response that do not show rapid changes due to short-term dynamic phenomena.
Using the raw digital number (DN) values per band extrapolated from each target, a linear relationship was constructed by empirically associating to the DNs the extreme percentage values (range 0-100%) of reflectance, low and high, respectively. The method of calibrating the DN of each band is called empirical line method (ELM). Precisely, ELM is a non-rigorous but basic approach to calibrate the DN of images to approximate units of surface reflectance in case no further spectroscopic information is available on the ground [39], as in our work. It then constructs a relationship between sensor radiance and surface  Table 1. The round icon identifies the north orientation of the areas.
Consequently, we considered the adequacy of the target sites against the criteria proposed in [34]: (a) high spatial homogeneity, concerning the spatial resolution of the image dataset, i.e. ideally, each target should cover an area of about 5 × 5 pixels in the reference images; (b) representativeness of the dynamic range of the radiance in the region; (c) low adjacency effects of targets located at an adequate distance from other volumetric scattering disturbances; (d) low slope effects, i.e., targets with flat or Lambertian surfaces; (e) low temporal variability of the spectral response, i.e., targets with stable spectral response that do not show rapid changes due to short-term dynamic phenomena.
Using the raw digital number (DN) values per band extrapolated from each target, a linear relationship was constructed by empirically associating to the DNs the extreme percentage values (range 0-100%) of reflectance, low and high, respectively. The method of calibrating the DN of each band is called empirical line method (ELM). Precisely, ELM is a non-rigorous but basic approach to calibrate the DN of images to approximate units of surface reflectance in case no further spectroscopic information is available on the ground [39], as in our work. It then constructs a relationship between sensor radiance and surface reflectance by calculating those nonvarying spectral targets and comparing these measurements to the respective image DNs [4]. Thus, prediction equations were derived that can contemplate changes in illumination and atmospheric effects. Due to the low altitude at which the measurements were taken and the unavailability of precise information, the impact of atmospheric effects was deliberately ignored [29].
The ELM for the RGB UAV sensed data could be estimated using the following linear equation: where ρ (λ) is the reflectance value for a specific band (range 0-100%), DNs are the raw digital numbers of the orthophotos, and A and B are terms which can be determined using a least-square fitting approach [29]. Although it is widely used with reasonable results, radiometric corrections using ELM can introduce noise, and caution should be exercised in its application. Indeed, most digital cameras have built-in algorithms that use a curvilinear function to transform electromagnetic radiation into digital signals in order to simulate the way human eyes perceive grey. Therefore, consumer cameras are designed to take pictures that look good, not to capture scientific data for research. Therefore, the relationship between surface reflectance and raw image DNs remains poorly decipherable for these cameras [4]. Goodness-of-fit measures, such as the coefficient of determination (R 2 ), were used to assess the accuracy of the ELM correction so that the regression's suitability could be quantitatively proven [34]. The A and B values of Equation (1) were estimated and used in the raster calculator in QGIS software to perform each band's radiometric calibration. To validate the consistency of the performed calibrations, 15 points per scenario were manually identified among vegetation, bare soil, and asphalt. Their spectral signatures were compared with those commonly accepted in the literature.

Vegetation Indices
Once the orthomosaics were radiometrically calibrated, various visible vegetation indices were computed with pixel values ranging between 0 and 1. The ten (10) VIs used in this work and their formulas are shown below. In particular, referring to the study carried out in [29], the following vegetation indices were assessed to all case studies: • Normalized green-red difference index (NGRDI) [45] (ρ • Green leaf index (GLI) [46] (2 • Visible atmospherically resistant index (VARI) [47] ( • Triangular greenness index (TGI) [48] 0 • Red-green ratio index (IRG) [49] ρ R − ρ G • Red-green-blue vegetation index (RGBVI) [50] ( • Red-green ratio index (RGRI) [51] Remote Sens. 2021, 13, 3238 where ρ B is the reflectance at the blue band, ρ G is the reflectance at the green band, ρ R is the reflectance at the red band, λ B is the wavelength of the blue band, λ G is the wavelength of the green band, and λ R is the wavelength of the red band.
As can be seen in Equation (5) to calculate the TGI index, the peak wavelength sensitivity of the RGB camera was required. Therefore, the index calculation still depends on the assumption that the user knows the peak wavelength sensitivity of the camera used. Low-cost RGB cameras were not supplied with the specifications of the mounted CMOS sensors, as in our cases [17]. It was therefore chosen to set default values for all cases to λ B = 480 nm, λ G = 560 nm, and λ R =655 nm.
The results were then analysed and compared using 150 random points automatically identified in the orthomosaics [54,55].

Classification Algorithm Feedback
The raster files concerning the radiometrically corrected RGB bands and the maps concerning the VIs, which were the most significant in terms of results as subsequently explained in Section 3.3, were imported into the Sentinel Application Platform (SNAP) software [56]. Actually, this is a common open-source architecture for ESA Toolboxes ideal for the exploitation of earth observation data. As its name implies, it is mainly designed for processing data concerning Copernicus Sentinel missions [56], but it is functional for different operations on different data as well [54,55].
SNAP integrates a multitude of tools for exploring and processing multisource data. For the purposes of this work, this platform offers the possibility to run supervised classification algorithms. Among these, random forest (RF) is a widespread supervised classification and tree regression technique [57]. Specifically, the RF algorithm randomly and iteratively samples data and variables to generate a large set, called a forest, of classification and regression trees [58]. The classification output describes the statistics of many decision trees, resulting in a more robust model than can be obtained from a single decision tree produced by a single execution of the technique [57]. Thus, the regression output from RF effectively represents the average of all regression trees grown in parallel without pruning [59]. The iterative nature of RF gives it a distinct advantage over other methods in that the data is effectively bootstrapped, thus feeding random subsets of the training data, to obtain more robust predictions and reducing the correlation between trees [60].
For each scenario, containers of vectors were generated for the training areas about the classes' vegetation, asphalt, and bare soil. For each class, 10 areas were manually drawn uniformly distributed over the whole scenario and including the radiometric heterogeneity of each class. Subsequently, 30 pins were placed for each class, which will act as validation points, i.e., pixels whose membership of a class is certified and whose classification prediction is verified. The RF algorithms were carried out using the generated training areas and using only the RGB bands as resources first, after which the information from the most significant vegetation maps, calculated in the previous step, was added. From the results extracted for each scenario, at the different resolutions, according to the criteria defined in [61], the confusion matrices and F-scores [62] for each class were arranged.
The F score proves to be an efficient metric of the accuracy of a test [62]. It is calculated from the combination of precision and recall of the test: precision is the number of true positive results divided by the number of all positive results, including those not correctly identified, while recall is the number of true positive results divided by the number of all samples that should have been identified as positive. The score takes values in the range from 0 to 1, where the latter represents perfect accuracy and recall, while in the opposite case accuracy is poor and reorganisation of the classification process is inevitable.

Results
Based on the photogrammetric processing chain described in Section 2.2, four orthomosaic solutions were exported for each scenario investigated. The spatial resolutions (m/pixel) established were reported in Table 3. Imported into the open-source platform QGIS, each orthomosaic underwent the same processing workflow.  Figure 3 shows the empirical lines obtained and the respective equations after the implementation of the ELM method. The coefficient of determination R 2 for each regressed empirical line showed an optimal prediction condition among the manually chosen points. Comparing the R 2 values obtained at varying spatial resolutions, the values are somewhat comparable. There is a slight, but not significant, decrease only in the solution {4}, thus indicating an increase in prediction errors. Future investigations could examine stronger down-sampling cases, thus identifying possible limitations of the methodology in finding these regression lines.

Radiometric Calibration of the Raw Orthophotos
The values of coefficients A and B, as given in equation (1), derived from the regression line equations in Figure 3, were used for the radiometric calibration of each band of each resolution solution listed in Table 3.
To confirm the radiometric correction, the reflectance values of each calibrated raster were extracted, and the spectral signatures of points falling in vegetated areas, in asphalt, and bare soil were constructed ( Figure 4).
As already described by [29], an unambiguous trend among the various selected points was difficult to extrapolate. Even in this process chain, it was not deemed necessary to distinguish among different behaviours by class. For example, in the case of vegetation points, no distinction was made between type and state of health, while for asphalts, the age of laying was not known. It was possible to state that the trends described in Figure 4 reasonably track behaviour observed in the literature [29,34]. The values of coefficients A and B, as given in equation (1), derived from the regression line equations in Figure 3, were used for the radiometric calibration of each band of each resolution solution listed in Table 3.
To confirm the radiometric correction, the reflectance values of each calibrated raster were extracted, and the spectral signatures of points falling in vegetated areas, in asphalt, and bare soil were constructed (Figure 4)   As already described by [29], an unambiguous trend among the various selected points was difficult to extrapolate. Even in this process chain, it was not deemed necessary to distinguish among different behaviours by class. For example, in the case of vegetation points, no distinction was made between type and state of health, while for asphalts, the age of laying was not known. It was possible to state that the trends described in Figure 4 reasonably track behaviour observed in the literature [29,34].

Vegetation Indices
After obtaining the radiometrically calibrated raster files of each band for each solution, the ten (10) vegetation indices listed in Section 2.3 were calculated. To validate the results, 150 random points were located, as shown in Figure 5. Figure 6 shows a distribution of these points between points in vegetated and non-vegetated areas. When classifying these points, points with reflectance values per band outside the range (0:1) were removed. These points suffer from the methodology of radiometric calibration of the images, which, as set out in Section 2.4, is still a raw and not entirely effective method. At other points, distortions caused by the low quality of the sensors and/or artefacts generated by the SfM-MVS procedures result in fallacious reflectance values [13]. The vegetation indices for all points were thus estimated.

Vegetation Indices
After obtaining the radiometrically calibrated raster files of each band for each solution, the ten (10) vegetation indices listed in Section 2.3 were calculated. To validate the results, 150 random points were located, as shown in Figure 5. Figure 6 shows a distribution of these points between points in vegetated and non-vegetated areas. When classifying these points, points with reflectance values per band outside the range (0:1) were removed. These points suffer from the methodology of radiometric calibration of the images, which, as set out in Section 2.4, is still a raw and not entirely effective method. At other points, distortions caused by the low quality of the sensors and/or artefacts generated by the SfM-MVS procedures result in fallacious reflectance values [13]. The vegetation indices for all points were thus estimated.

Statistics
Given the distinction between points in vegetated and non-vegetated areas (Figure 6), the two statistical populations for each VI, for each resolution solution and scenario, were subjected to a t-test with a 95% confidence level to interrogate their significance for subsequent statistical inferences. Therefore, the latter presented acceptable and not acceptable results in terms of significance relative to the chosen confidence level. In particular, the not acceptable results already attested to a complete inability to separate vegetated and non-vegetated areas since the mean values of the indices cannot be defined independent.
The normalised difference between the mean value V I for each index over vegetated areas and nonvegetated areas: represents the adopted descriptor of the propensity of each vegetation index to attesting separability in the extraction of the above classes. The results were presented in Table 4. Blue indicates negative normalised difference value, while red indicates positive value per vegetation index for each spatial resolution. Lighter colours thus indicate a low degree of separability. The acronym NA identifies not acceptable results, defined above, due to differences between the means of the indices that are not significant for the 95% confidence level adopted in the t-test. tion of these points between points in vegetated and non-vegetated areas. When classifying these points, points with reflectance values per band outside the range (0:1) were removed. These points suffer from the methodology of radiometric calibration of the images, which, as set out in Section 2.4, is still a raw and not entirely effective method. At other points, distortions caused by the low quality of the sensors and/or artefacts generated by the SfM-MVS procedures result in fallacious reflectance values [13]. The vegetation indices for all points were thus estimated.  tion of these points between points in vegetated and non-vegetated areas. When classifying these points, points with reflectance values per band outside the range (0:1) were removed. These points suffer from the methodology of radiometric calibration of the images, which, as set out in Section 2.4, is still a raw and not entirely effective method. At other points, distortions caused by the low quality of the sensors and/or artefacts generated by the SfM-MVS procedures result in fallacious reflectance values [13]. The vegetation indices for all points were thus estimated.   The results obtained in the (a) case of Fasoula (Cyprus) showed a higher mean acceptability ratio in the t-test equal to 0.175. Regular surfaces, low vegetation, and clearly distinguishable feature point certainly make orthomosaics more workable for vegetation indices. Subsequently, a somewhat comparable average acceptability ratio was found in the scenario (c mask ). In this case, a ratio of 0.2 was recorded between acceptable and not acceptable values in all resolutions. In particular, the masking of the water areas from the orthomosaic improves their interpretability by the indices, returning acceptable values of separability between the classes investigated. Case (c) showed a mean ratio of 0.65 between the analysed resolutions. Particularly remarkable were the values assumed by the IRG index (c mask ), compared with the corresponding NA results in case (c). For the following statistics, it was therefore preferable to focus on the (c mask ) case.
Last, case (b) was characterised by an average acceptability ratio of 0.5. Only the latter scenario showed a linear improvement in the acceptability ratio as the spatial resolution decreases.
The highest magnitude was recorded in case (c){3}, so without masking, about the TGI index with a value of 6337.3%. Besides, this index presented the acceptability ratio equal to 0, showing itself to be functional in all cases and taking on very consistent values.
The ExG index looked functional in each scenario (ratio equal to 0) and at each resolution adopted: the most significant scores were presented in scenarios (b) and (c mask ). Noted a percentage deviation of more than 20% between scenarios (c) and (c mask ).
The VARI index was not acceptable for all cases analysed, except for case (b) at resolution {4}, thus presenting the highest ratio of 0.9375 among the indices. Moreover, it did not have a score such that it can be considered functional in the separability between classes. As already resulted in [29], the CIVE (ratio 0.0625) index obtained the lowest score for all the resolution solutions in each scenario; among them, the scenario (c mask ) is the most reactive. The NGRDI behaved similarly to the VARI index and was only acceptable in resolutions {2} and {3} of scenarios (a) and (c), with insignificant scores below 15%. Its acceptability ratio was 0.75. Scenario (c) did not respond to the MGRVI (ratio 0.4375), RGRI (r. 0.4375), RGBVI (r. 0.375), IRG (r. 0.3125), and GLI (r. 0.4375) indices, while significant scores were obtained in both (c mask ) and (a), except for the RGRI index in the latter scenario. Noteworthy values were recorded in scenario (c mask ) for the IRG index.
In general, the case study (c max ) (archaeological area with water masking) tended to give high differences between vegetated and non-vegetated areas regardless of the applied vegetation index, indicating that postprocessing of the images by removing areas of ambiguity, such as water, optimises interpretability in the analysis. It was not possible to identify the most challenging environment to work with and try to discriminate vegetation from other areas at any resolution solution. The general trend suggested that as the sampling frequency increases, lower resolutions reduce ambiguities or noise in vegetated areas, thus improving discriminability. From Table 4, it can be deduced that within the same trend, some resolutions work better than other lower resolutions and are therefore optimal in describing the radiometric information.
Based on the results of Table 4, a performance comparison between indices was set up, i.e., a normalised difference for all case studies between all vegetation indices referenced to the NGRDI index. The results of this analysis were shown in Figure 7 for each scenario in the various spatial resolution solutions. The normalised difference indicated the percentage difference between: where each mean value was normalised to the maximum value of each index among vegetated points, therefore, in Figure 7, high values imply that the VI i index performed better than the NGRDI index; on the contrary, negative values suggest that the specific index performed worse. Trivially, vegetation indices around zero have comparable performance with the reference index. From the results in Figure 7, it was observed that the IRG index performed positively in comparison to the NGRDI index for all case studies in any resolution solution. In most cases, the VARI index also exhibited positive behaviour relative to the reference index, except for scenario (a) in resolution {1} in which it takes on a negative but near-zero value of −0.48%. However, the IRG and VARI performance did not show efficiencies of more than 10%, and in the case of the VARI index, this is almost as good as the NGRDI index. On the other hand, given the considerations from Table 4, the VARI index cannot be considered completely efficient. With its acceptability ratio of 0.3125, the IRG index was shown to be non-functional in scenario (c). Therefore, it is based on Figure 7, efficient for this work. The latter presents a slightly decreasing efficiency in the case study (a), increasing in case (b) and peaking at resolution {3} in case (c max ).
An irregular performance was that of the TGI vegetation index as it provided for case (a) returns of over 60%, up to a maximum of 121% in the different resolutions, while for cases (b) and (c max ) very negative values, except case study (b) at the first resolution {1} where it even reached a value of 144%.
The remainder of the calculated vegetation indices, on the other hand, show negative returns compared to the reference index: not excessively high values of less than −20%. It was not possible to describe a normal behaviour of the efficiency of the indices as the adopted resolution varies. In this regard, Agapiou [29] stated that for each case study, the optimal index is not unique, which is also in line with the previous results in Table 4. Thus, it is not possible to deduce a direct relationship between the resolution of the orthomosaic and the index returned.

Supervised Classification Responses
After performing the supervised classification procedures using the RF algorithms, the validation metrics were extracted. In particular, considering the comparison between the labelling assigned and that predicted by the software in the 90 pins placed, for each scenario, at each resolution and for each classification mode (RGB bands, adding the TGI band, adding the IRG band), the confusion matrices were extracted and from these the F-Scores were computed. In Table 5, the F-Scores were summarised for each class: vegetation (V), asphalt (A) and bare soil (B). In the present examination of the third scenario, the unmasked data (c) was preferred in order to analyse the performance of the classification algorithms at a basic level of image processing, i.e., only radiometrically corrected. Table 5. Summary of the F-score values calculated for the cases under study, at various spatial resolutions and for each class: V identifies vegetation, A stands for asphalt while B represents bare soil. Colours tending towards blue identify low F-score values for each resolution in the three classification modes, conversely, colours tending towards red identify high F-score values. In bold are the average values of the F-scores per case study and classification mode. In view of the results shown in Table 4 and Figure 7 in Section 3.3, a reasonable interest in the vegetative indices TGI and IRG and their behaviour was inferred. While the TGI index was taken on significant values in Figure 7, these were not positive in all cases. On the other hand, the IRG index, although with much smaller values, always presents effective values. The classifications were therefore carried out by first using only the radiometrically corrected RGB bands, then the TGI and IRG vegetation maps were used as additional resources separately. In Table 4, the TGI index showed the maximum acceptability ratio, i.e., it was continually functional with always significant values. While the IRG index did not work for scenario (c), it reactivated for scenario (c mask ) such that the interest in the results obtainable from the classification in (c) increases.

RGB Bands +TGI Band +IRG Band
Given the considerations stated in Section 3.1, a coarse radiometric calibration such as that by ELM does not allow a clear distinction to be made between and in the image components. This translates in the classification into a reduction of the ability to distinguish classes such as asphalt and bare ground, which in several cases may have spectral similarities (e.g., as in the case of very old asphalt). As can be observed in Table 5 Focusing on scenario (a), it is evident that the launched classifications, at all the analysed resolutions, benefited from the TGI index while the IRG index produced limited negative effects with respect to the RGB base case. Contrary to the predictions dictated by Table 4 and Figure 7 that noticed the case {4}, the spatial resolution case {2} using the TGI index was the most efficient in terms of performance. Indeed, the lower resolution case {4} was almost comparable in terms of average performance but the RF algorithms implemented in the SNAP software benefited from the better resolution of {2} to build better decision trees.
In scenario (b), the values obtained in all cases were noteworthy, but those obtained using the vegetative indices stand out. It was not possible to identify a unique trend among these and at each resolution the F-scores vary producing for each class limited positive or negative effects. In general, the lower resolution produced more significant values, supporting the hypothesis that a reduction in resolution reduces the distortions and noise in pixels caused by SfM techniques. Among all cases, the RF algorithms produced the most effective results using the IRG information in solution {4}. In comparison with the results obtained in the previous section, this product can be considered consistent with the performance of the discussed statistics.
Finally, scenario (c) presented some emblematic values. In particular, although the Fscore values for the vegetation class were considered valid in all cases, the other two classes presented very fluctuating values, reaching values even lower than 0.5. From the point of view of the vegetation class extraction, the highest F-score value was shown by using the TGI index in the {3} solution, as also demonstrated in Table 4. In fact, this latter resolution presented the highest performance values, proving to be optimal for the classifications of this scenario. Among these, the most efficient case for the three classes analysed was the one that used only the RGB bands during classification. This demonstrates, once again, that in this scenario the usable indices can only become more significant after a masking procedure. Figure 8 shows the most successful classification maps obtained.
the TGI index in the {3} solution, as also demonstrated in Table 4. In fact, this latter resolution presented the highest performance values, proving to be optimal for the classifications of this scenario. Among these, the most efficient case for the three classes analysed was the one that used only the RGB bands during classification. This demonstrates, once again, that in this scenario the usable indices can only become more significant after a masking procedure. Figure 8 shows the most successful classification maps obtained.

Discussion
The results shown in the previous section provide some useful considerations to prepare an interesting discussion about vegetation extraction in orthomosaics, in various environments and at various spatial resolution solutions.
The application of vegetation indices based on visible bands, perhaps of a nonmetric camera, highlights the potential for discriminating vegetated areas widespread and routine. Hence, to understand their limitations and efficiencies, the results presented in Table  4 indicated the behaviour of different indices in performing high or low separability between vegetated and non-vegetated areas. In several cases, even some indices could not return significant values and therefore not acceptable at the 95% confidence level of the ttest. As already documented in other recent works [18,25,29], it has been shown that there cannot be a single index performing in the same way for the various case studies. Consequently, it was apparent that each index is suitable for particular environmental contexts. Thus, there is a need to generate an abundant collection of cases to statistically deduce any similarities between the various indices and the analysed contexts.
In this regard, the values of the average acceptability ratios allowed us to deduce certain issues. The context of some orthomosaics can be quite complex, such as the case study (c). The results learned in Table 4 showed how the masking of highly ambiguous areas, such as areas with the presence of water, completely improves the interpretability of the images. This was also demonstrated by the results obtained in Table 5 about the efficiency of the classifications. Indeed, masking leads to the exclusion of false-positive points. Differences in the sensitivity of cameras to capture backscattered reflection values

Discussion
The results shown in the previous section provide some useful considerations to prepare an interesting discussion about vegetation extraction in orthomosaics, in various environments and at various spatial resolution solutions.
The application of vegetation indices based on visible bands, perhaps of a nonmetric camera, highlights the potential for discriminating vegetated areas widespread and routine. Hence, to understand their limitations and efficiencies, the results presented in Table 4 indicated the behaviour of different indices in performing high or low separability between vegetated and non-vegetated areas. In several cases, even some indices could not return significant values and therefore not acceptable at the 95% confidence level of the t-test. As already documented in other recent works [18,25,29], it has been shown that there cannot be a single index performing in the same way for the various case studies. Consequently, it was apparent that each index is suitable for particular environmental contexts. Thus, there is a need to generate an abundant collection of cases to statistically deduce any similarities between the various indices and the analysed contexts.
In this regard, the values of the average acceptability ratios allowed us to deduce certain issues. The context of some orthomosaics can be quite complex, such as the case study (c). The results learned in Table 4 showed how the masking of highly ambiguous areas, such as areas with the presence of water, completely improves the interpretability of the images. This was also demonstrated by the results obtained in Table 5 about the efficiency of the classifications. Indeed, masking leads to the exclusion of false-positive points. Differences in the sensitivity of cameras to capture backscattered reflection values in specific wavelengths can be significant, as has been shown in the past by other studies [38]. In addition to the spectral complexity and heterogeneity of the scenario, some other factors can affect the indices' overall performance: e.g., inhomogeneous lighting (low-clouds effect), sun exposure (shaded and partially sunny areas), and presence of dense vegetation. In scenario (b), photogrammetric processing of densely vegetated areas generated many noisy and distorted areas. Generally, these areas appeared as a constant source of reconstruction errors due to the low efficiency of SfM techniques in defining unambiguous points. The matching algorithms are weak in identifying stable tie points in vegetated areas, which generates artefacts and distortions that are challenging to resolve [13]. As shown in Table 4, this set off a loss of efficiency of extraction techniques in vegetated areas. In Table 5, however, it is shown how a lower spatial resolution can benefit the classification results, as distortions and noise are reduced in the subsampling. Scenario (a) proved to be more advantageous in applying the vegetation indices as it was characterised, albeit with a heterogeneous context, by different points and areas not subjected to high noise. On the other hand, only in this scenario is the TGI index considered very incisive, given the observations deduced from Tables 4 and 5 and Figure 7. This supports what had already been stated: the efficiency of one vegetation index less being than another is indeed linked to the context detected. Another important finding in this matter was that in cases where a ratio between combinations of visible bands was included in the formulation of indices (Equations (2)-(4), (7)-(9)), these did not produce acceptable results in scenarios (b) and (c).
Recent studies [20] have shown that the optimal resolution for remote sensing applications was related to the spatial characteristics of the targets under examination and their spectral properties. Indeed, in [20] the high resolution of an orthomosaic was not always optimal for a given vegetation index in the various case studies. Using low-cost camera sensors, it is assumed that there are overlaps among channels that cannot independently record distinct ranges of wavelengths. As no information on this is available, however, it is not possible to measure its relevance. This issue first was transmitted to the radiometric information recorded in the pixels and subsequently to the formulation of the indices, whose components thus become correlated, as indeed observed.
Considering Table 4, the resolution {3} presented a reasonable acceptability ratio for all scenarios and this was also revealed in Table 5 about the results of the classification. This was not consistent with the magnitude of separability of the examined indices and the efficiency concerning the NGRDI reference index (Figure 7). Actually, a reduction in the spatial resolution smoothens out any noises or distortions caused in the photogrammetric generation of orthomosaics or derived from the poor quality of the starting images. In contrast, looking at Figure 7, it was therefore evident that each vegetation index does not have a unique behaviour when the resolution changes: each of them has an optimal resolution for each analysed context, thus indicating the complexity of extrapolating a direct relationship between these parameters.
Comparing the observations of Table 4 and Figure 7, only the IRG index presents profitable characteristics for all scenarios in each resolution solution. Although its efficiency compared to the NGRDI was less than −5%, wide use of the ExG index cannot be excluded. This was indeed consistent in discriminating vegetated areas in any context and any resolution solution.
A key step was to obtain an overview of the impacts of processing and statistics on pixel-based classification algorithms. Analysing the metrics computed from the confusion matrices (Table 5), no linear dependence between the spatial resolutions and the incidence of the bands used in the classification was noticeable. Indeed, this incidence itself depends on the analysed scenario and as highlighted in (c), some manipulations of the dataset at the base level can improve its interpretability by the classification algorithms. In all the analysed cases, the orthomosaics obtained from photogrammetric surveys can be considered valid tools for the vegetation extraction, while the distinction between nonvegetation classes was more difficult. Finally, this processing chain was a valid expedient for the extraction of vegetation classes, accessible to a wide range of users and application cases.

Conclusions
A large body of literature notes the potential and versatility of UAVs. The use of these relatively low-cost platforms combined with the strong development of SfM-MVS techniques can return a wide range of photogrammetric products, including high-resolution orthomosaics i.e., higher resolution than traditional aerial or satellite observations, of small and medium areas. In many applications mainly related to the monitoring of agricultural areas, forests, etc., the ability to discriminate and extract vegetated areas is indispensable. The use of sophisticated sensors capable of capturing spectral information in the nearinfrared band would facilitate operations. However, in most cases, UAVs are equipped with low-cost sensors sensitive to the visible part of the spectrum (RGB), making the detection of vegetated areas quite challenging. Furthermore, when working with radiometric information, the images' radiometric calibration is indispensable to convert the raw digital numbers (DN) into reflectance values. In robust terms, calibration requires field Remote Sens. 2021, 13, 3238 22 of 25 campaigns of a spectroscopic survey of defined targets to obtain a good approximation of the backscattered radiance of the same targets later observed in the images.
In this work, the ELM radiometric correction technique of orthomosaics generated from images acquired by UAV of three different case studies, with different contexts, RGB sensors, and different spatial resolution solutions has been explored. This easily applicable procedure does not require any knowledge of ground targets or field campaigns with spectroradiometers and spectral reflectance targets. The calibrations applied to the various cases screened returned spectral signatures of control points extracted in vegetation areas, asphalt, and bare soil in line with those widely accepted in other literature.
Ten VIs, sensitive in the visible part of the spectrum, were computed based on the visible bands. The results were further manipulated to examine the performance of each index and then to quantify their impacts on the RF supervised classification procedures. From the results of this study, the following aspects can be highlighted:

•
The performance of each index varied for each case study, as already observed in other works. Therefore, to estimate the performance of the indices in general, it is essential to construct a broad case history covering as many contexts as possible.

•
The TGI index, able to return very significant and functional values in terms of separability between vegetated and non-vegetated areas, performs better than the NGRDI index, taken as a reference, only in a regular context without ambiguous areas. The IRG index, on the other hand, performs well in all scenarios but with moderate performance. • High resolution of an orthomosaic was not frequently optimal for vegetation indices in the various case studies; by reducing the resolution, the noise in each pixel is smoothened out, improving the radiometric information. In fact, the classification algorithms gave optimal results for {3} resolutions, demonstrating that very highresolution datasets are not always a guarantee of more precise results.

•
The masking of areas that are strongly characterised by ambiguity, such as those in the presence of water, improves their interpretability by the indices and increases their performance.

•
In areas with dense vegetation, the reduced ability of SfM-MVS techniques to establish and triangulate unambiguous junction points produces artefacts or obvious distortions that compromise VIs' performance in extracting correct information.

•
Looking at the average performance of the RF classification algorithms, for each case analysed, it emerged that RGB orthomosaics can be considered a valid source for generic vegetation extraction.
This study's results can be applied to any RGB orthomosaic, taken from a low-altitude system or aerial imagery. The large fleet of low-cost ready-to-fly (RTF) UAVs equipped mainly with inexpensive RGB sensors will continue to grow, and using the approach adopted in this work is an opportunity to exploit the masses of data that can be acquired fully. In the future, targeted VIs will be developed to address specific needs, making vegetation extraction a simpler and more straightforward procedure.
Given the considerations learned about the behaviour of the VIs at varying spatial resolution, more insights can be addressed in future studies about the variability of the same vegetation indices based on visible bands about multitemporal UAV acquisitions of the same scenario.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not Applicable.