Sequential PCA-based Classiﬁcation of Mediterranean Forest Plants using Airborne Hyperspectral Remote Sensing

: In recent years, hyperspectral remote sensing (HRS) has become common practice for remote analyses of the physiognomy and composition of forests. Supervised classiﬁcation is often used for this purpose, but demands intensive sampling and analyses, whereas unsupervised classiﬁcation often requires information retrieval out of the large HRS datasets, thereby not realizing the full potential of the technology. An improved principal component analysis-based classiﬁcation (PCABC) scheme is presented and intended to provide accurate and sequential image-based unsupervised classiﬁcation of Mediterranean forest species. In this study, unsupervised classiﬁcation and reduction of data size are performed simultaneously by applying binary sequential thresholding to principal components, each time on a spatially reduced subscene that includes the entire spectral range. The methodology was tested on HRS data acquired by the airborne AisaFENIX HRS sensor over a Mediterranean forest in Mount Horshan, Israel. A comprehensive ﬁeld-validation survey was performed, sampling 257 randomly selected individual plants. The PCABC provided highly improved results compared to the traditional unsupervised classiﬁcation methodologies, reaching an overall accuracy of 91%. The presented approach may contribute to improved monitoring, management, and conservation of Mediterranean and similar forests. AisaFENIX HRS sensor, and this study reports on the outcome of the approach.


Introduction
A wide range of environmental and ecological studies, including studies of forest and vegetation landscapes, require reliable information on land cover at local to global scales [1,2]. In particular, analyses of the composition and structure of vegetation, especially in diverse forests, are important for understanding ecosystem functioning [3].
In recent years, the use of hyperspectral remote sensing (HRS) data has become common for identifying the composition and physiognomy of forests across large areas [4][5][6][7]. The high spectral resolution of HRS (50-100 nm band width) allows identification of each classified land-cover cluster according to its known spectral signature, thereby enabling detailed analyses of land covers such as mineral composition [8], soil type [9], and vegetation structure and composition [8][9][10][11][12][13][14][15][16][17][18][19]. However, the ability to interpret and accurately classify the observed data is still constrained by factors such as: sensor attributes, sun altitude and azimuth, quality of ground-truth information, and the method of classification [20]. Seasonal and phenological changes contribute additional uncertainty in the classification of vegetation cover [21,22].
The large volume of data provided by HRS allows for a better distinction between classes, but significantly increases the complexity of the data processing. One of the common preprocessing methods applied to reduce the dimensionality of HRS data is principal component analysis (PCA) [23][24][25]. PCA is an eigenvector-based multivariate statistical analysis that applies eigenvalue decomposition of the data covariance (or correlation) matrix to convert correlated variables across all spectral bands into a set of values of linearly uncorrelated variables termed principal components (PCs). The number of PCs is equal to the number of bands, which are often visualized as grayscale band images presenting the "component score".
The transformation of the data obtained by PCA is defined such that the first PC displays the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn accounts for the next highest possible variance. In most cases, a second stage of inverse transformation of the PCA back to the original values is performed using only the first few PCs, without the noise or the autocorrelation between spectral bands. However, using only the coherent bands for the inverted image may result in some data loss [23,[26][27][28]. A mathematical interpretation and historical review of PCA can be found in Green et al. [24], Jensen [26], and Rodarmel and Shan [25].
The PCA-reduced image is then commonly subjected to supervised or unsupervised classification approaches applied either to the inverted image or directly to the PCs [28][29][30][31][32][33][34][35][36][37][38][39]. More examples of PCA use in remote-sensing classification can be found [28,[40][41][42][43][44][45][46][47][48]. However, in most of those studies, the main purpose for using the PCA was image enhancement and reduction of the autocorrelation between the spectral bands prior to a conventional classification scheme. During the last two decades, PCA techniques have been extensively applied for extracting spectral features to be used in supervised classifications [29,36]. However, small classes that are represented by relatively fewer pixels are more likely to remain undetected by the first PCs of the PCA. This is also the case when attempting to detect a phenomenon or variable that causes only subtle differences in the target reference [37,49].
In practice, it is sometimes difficult to obtain adequate training samples for supervised methods of hyperspectral image classification due to the requirement of large amounts of data from the field [50,51]. Unsupervised classifications, on the other hand, utilize statistical and numerical processes to outline groups of pixels with similar spectral features in the image without any preliminary sampling or training data [26,27,49,52]. This type of classification results in a thematic map of spectral clusters which must later be identified as classes and validated by the analyst in the field [26,27]. However, unsupervised classifications applied to PCA-reduced images are usually limited to the large and dominant classes appearing in the first PCs [37,49,52].
Implementation of efficient classification and mapping of forest plant species can contribute to forest monitoring and management [53][54][55][56][57]. However, this is a challenging task for most classification methodologies, particularly in the case of dense Mediterranean forests, which are highly diverse in species and contain many functionally and structurally similar species. To the best of our knowledge, such unsupervised classification has never been successfully implemented for a Mediterranean forest using HRS image datasets.
In this study, we test the compatibility of commonly used classifiers such as K-means and ISODATA for plant species classification in a Mediterranean forest ecosystem, and propose an improved PCA-based classification (PCABC) approach. The presented methodology provides better classification of the vegetation using the advantages of PCA for unsupervised variability detection of HRS data, while overcoming the known disadvantages: (a) low sensitivity to subtle differences in the target reference, (b) inability to detect small classes, and (c) a reduction of spectral data (bands) during the process. A Mediterranean forest in Mount Horshan, Israel was used as the test site for the methodology applied to data from Specim's airborne AisaFENIX HRS sensor, and this study reports on the outcome of the approach.

Research Site
The research was conducted in Mount Horshan (178 m ASL) in northern Israel. The annual precipitation is 584 mm (Israel Meteorological Service). The annual daily mean temperature is 20.12 • C-14.4 • C in the winter months and 25.4 • C in the summer months (Israel Meteorological Service). The area consists of both natural and planted Mediterranean forests covering an area of over 19 km 2 and encompassing a wide range of typical Mediterranean plant communities, including planted pine forests, dense woodlands with evergreen and deciduous oak trees, and short and tall Mediterranean shrubland (garrigue and maquis, respectively) [58,59]. The dominant species at the site are Pinus halepensis Mill., Quercus calliprinos Web, Quercus ithaburensis Decne., Pistacia lentiscus L., diverse vine species, sclerophyllous Mediterranean shrubs, and dwarf shrubs. The surrounding area includes urban, rural, and agricultural land uses. Figure 1 presents an aerial photo of the study area, overlaid with the Specim AisaFENIX image, including the location of the 257 ground-truth validation points.

Research Site
The research was conducted in Mount Horshan (178 m ASL) in northern Israel. The annual precipitation is 584 mm (Israel Meteorological Service). The annual daily mean temperature is 20.12°C-14.4°C in the winter months and 25.4°C in the summer months (Israel Meteorological Service). The area consists of both natural and planted Mediterranean forests covering an area of over 19 km 2 and encompassing a wide range of typical Mediterranean plant communities, including planted pine forests, dense woodlands with evergreen and deciduous oak trees, and short and tall Mediterranean shrubland (garrigue and maquis, respectively) [58,59]. The dominant species at the site are Pinus halepensis Mill., Quercus calliprinos Web, Quercus ithaburensis Decne., Pistacia lentiscus L., diverse vine species, sclerophyllous Mediterranean shrubs, and dwarf shrubs. The surrounding area includes urban, rural, and agricultural land uses. Figure 1 presents an aerial photo of the study area, overlaid with the Specim AisaFENIX image, including the location of the 257 ground-truth validation points.

Preprocessing
The Specim airborne AisaFENIX sensor used for this study covers a spectral range of 0.4-2.5 μm, with 448 spectral bands and a full width at half maximum of 0.0035-0.0055 μm. The analyzed image was acquired on 17 Sep 2014 at 06:55 UTC. A flight altitude of 792 m above ground level provided a spatial resolution of 1 m. ENVI image-processing package (Research Systems Inc. 1999) and ArcView GIS (ESRI Inc.) served as the major processing and analysis tools for this study.
The raw hyperspectral data were preprocessed using CaliGeoPRO software (Spectral Imaging Ltd., Oulu, Finland), resulting in a georectified radiance image. The radiance database was atmospherically corrected (ATC) using Atmospheric Correction Right Now (ACORN) software [60], with a gain factor calculated from a calibration site during the same flight based on the Supervised Vicarious Calibration method [61], resulting in a reflectance image that was validated on the ground using spectral measurements of control points acquired by Analytical Spectral devise (ASD). Topographic and illumination corrections were not used, and the shade pixels were masked out from

Preprocessing
The Specim airborne AisaFENIX sensor used for this study covers a spectral range of 0.4-2.5 µm, with 448 spectral bands and a full width at half maximum of 0.0035-0.0055 µm. The analyzed image was acquired on 17 Sep 2014 at 06:55 UTC. A flight altitude of 792 m above ground level provided a spatial resolution of 1 m. ENVI image-processing package (Research Systems Inc. 1999) and ArcView GIS (ESRI Inc.) served as the major processing and analysis tools for this study.
The raw hyperspectral data were preprocessed using CaliGeoPRO software (Spectral Imaging Ltd., Oulu, Finland), resulting in a georectified radiance image. The radiance database was atmospherically corrected (ATC) using Atmospheric Correction Right Now (ACORN) software [60], with a gain factor calculated from a calibration site during the same flight based on the Supervised Vicarious Calibration method [61], resulting in a reflectance image that was validated on the ground using spectral measurements of control points acquired by Analytical Spectral devise (ASD). Topographic and illumination corrections were not used, and the shade pixels were masked out from the image using the PCABC method, as elaborated in Section 2.4. Of the original 448 spectral bands, 155 unusable bands were omitted from the dataset, mainly over the spectral region of the absorption features of atmospheric gases (1.34-1.51 µm, 1.80-2.06 µm), as well as at the edges of the image spectrum, due to low signal-to-noise ratios (0.379-0.430 µm, and 2.38-2.50 µm). The final preprocessed image contained 293 bands that were used for further processing. Areas outside the forest area such as agricultural fields have been excluded manually from the image as they do not include Mediterranean plant species.

K-Means and ISODATA Classifiers
K-means and ISODATA classifiers are commonly used methodologies that do not require any preliminary input data, and were therefore chosen for this study. ISODATA and K-means unsupervised classifiers calculate class means that are evenly distributed in the data space and then iteratively cluster the remaining pixels using minimum distance techniques [26,27,62].
Each iteration recalculates means and reclassifies pixels with respect to the new means. Iterative class splitting, merging, and deletion are based on input threshold parameters. All pixels are classified to the nearest class unless a standard deviation or distance threshold is specified, in which case some pixels may be unclassified if they do not meet the selected criteria. The process continues until the number of pixels in each class changes by less than the selected pixel change threshold, or the maximum number of iterations is reached [26,27,62].
Both methods were applied to the PCA components produced for the Specim AisaFENIX reflectance image as is commonly practiced for HRS images [39]. Table 1 presents the K-means and ISODATA classification parameters that were used for the classification. We used ENVI image-processing package (Research Systems Inc. 1999) in order to apply these classifications classifiers on the reflectance image.

PCABC Processing
The PCABC procedure is principally a binary threshold applied to the first few PC bands set for identifying and clustering endmembers in an unsupervised approach. However, the process presented here attempts to reach higher sensitivity to subtle differences in the target reference and small classes by masking out each detected class from the dataset. The masked dataset is then reintroduced to the PCA and binary masking procedure. The procedure is performed in a sequential manner until no class can be identified. During this process, inverse PCA transformation is not applied, nor are supervised or unsupervised classification schemes used. At the first PCA iteration of the AisaFENIX reflectance image, the first PCs were analyzed in search of outlying bright and dark pixels, representing clusters that vary from the rest of the image. A close inspection of the PCs in comparison to the high-resolution orthophoto clearly showed that vegetation vs. non-vegetation pixels were the source of the highest variance. The density slice (DS) tool was used for binary thresholding and categorization of the pixel clusters. This algorithm divides image band gray-scale values into groups based on mean and standard deviation calculations (Research Systems Inc. 1999). Figure 3 illustrates the process of producing the non-vegetation mask by the PCABC method in a subscene of the image (marked with a red square). A comparison of the orthophoto ( Figure 3a) and Specim AisaFENIX RGB image ( Figure 3b) to the first PC component of the first PCA iteration ( Figure  3c) clearly shows that the highest detection variability is for vegetation vs. non-vegetation, the latter including ground, rock, roads, and shade (non-vegetation pixels appear in black). The DS tool was used to mark the non-vegetation pixels (non-vegetation pixels appear in red to magenta in Figures 3d, 3e). Finally, a non-vegetation mask was produced and applied to the image ( Figure 3f). At the first PCA iteration of the AisaFENIX reflectance image, the first PCs were analyzed in search of outlying bright and dark pixels, representing clusters that vary from the rest of the image. A close inspection of the PCs in comparison to the high-resolution orthophoto clearly showed that vegetation vs. non-vegetation pixels were the source of the highest variance. The density slice (DS) tool was used for binary thresholding and categorization of the pixel clusters. This algorithm divides image band gray-scale values into groups based on mean and standard deviation calculations (Research Systems Inc. 1999). Figure 3 illustrates the process of producing the non-vegetation mask by the PCABC method in a subscene of the image (marked with a red square). A comparison of the orthophoto ( Figure 3a) and Specim AisaFENIX RGB image ( Figure 3b) to the first PC component of the first PCA iteration ( Figure 3c) clearly shows that the highest detection variability is for vegetation vs. non-vegetation, the latter including ground, rock, roads, and shade (non-vegetation pixels appear in black). The DS tool was used to mark the non-vegetation pixels (non-vegetation pixels appear in red to magenta in Figure 3d,e). Finally, a non-vegetation mask was produced and applied to the image ( Figure 3f).
The rationale underlying the PCABC is that exclusion of dominant endmembers from the dataset may allow detection of new dominant endmembers in the remaining spatial subset, which retains all spectral bands. Reiterating this process in a sequential manner may result in a detailed classification of the dataset to most of its endmembers. This was done in the (second) iteration, as illustrated in Figure 4, which presents a subscene of PCA first iteration 1, component 1 (Figure 4a) next to a subscene of the second iteration performed on the reflectance image including all 293 bands, and masking out only non-vegetated pixels ( Figure 4b). There is a significant covariant difference between these two images; in Figure 4a, there are no significant differences among the vegetation pixels, whereas in Figure 4b, some vegetation pixels appear in white (higher PC values). The DS tool assisted in clustering these pixels ( Figure 4c). Using the highest DS values, an ENVI classification image was produced (Figure 4d), representing the first identified plant species. a subscene of the image (marked with a red square). A comparison of the orthophoto ( Figure 3a) and Specim AisaFENIX RGB image (Figure 3b) to the first PC component of the first PCA iteration ( Figure  3c) clearly shows that the highest detection variability is for vegetation vs. non-vegetation, the latter including ground, rock, roads, and shade (non-vegetation pixels appear in black). The DS tool was used to mark the non-vegetation pixels (non-vegetation pixels appear in red to magenta in Figures  3d, 3e). Finally, a non-vegetation mask was produced and applied to the image (Figure 3f).  The rationale underlying the PCABC is that exclusion of dominant endmembers from the dataset may allow detection of new dominant endmembers in the remaining spatial subset, which retains all spectral bands. Reiterating this process in a sequential manner may result in a detailed classification of the dataset to most of its endmembers. This was done in the (second) iteration, as illustrated in Figure 4, which presents a subscene of PCA first iteration 1, component 1 (Figure 4a) next to a subscene of the second iteration performed on the reflectance image including all 293 bands, and masking out only non-vegetated pixels (Figure 4b). There is a significant covariant difference between these two images; in Figure 4a, there are no significant differences among the vegetation pixels, whereas in Figure 4b, some vegetation pixels appear in white (higher PC values). The DS tool assisted in clustering these pixels (Figure 4c). Using the highest DS values, an ENVI classification image was produced (Figure 4d), representing the first identified plant species.
The identified class was masked out from the image before performing the next iteration. In this third iteration, five different clusters were identified at PCs 1-3. Of these, four were clustered from the second PC band. These clusters were transformed into five different ENVI classification images and were all masked out of the image. The fourth iteration was performed on the reflectance image with a mask including the non-vegetated pixels as well as all previously identified plant classes. At this stage (fourth iteration), no additional classes were detected. As a result of the relatively high spectral and spatial resolution of the image, in some cases a single tree crown with high PC values was assigned several DS ranges (Figure 5a). These were therefore merged into one cluster ( Figure 5b); DS values of: 10.8535-40.4227. In order to ensure that The identified class was masked out from the image before performing the next iteration. In this third iteration, five different clusters were identified at PCs 1-3. Of these, four were clustered from the Remote Sens. 2019, 11, 2800 7 of 19 second PC band. These clusters were transformed into five different ENVI classification images and were all masked out of the image. The fourth iteration was performed on the reflectance image with a mask including the non-vegetated pixels as well as all previously identified plant classes. At this stage (fourth iteration), no additional classes were detected.
As a result of the relatively high spectral and spatial resolution of the image, in some cases a single tree crown with high PC values was assigned several DS ranges (Figure 5a). These were therefore merged into one cluster ( Figure 5b); DS values of: 10.8535-40.4227. In order to ensure that the various PC clusters are consolidated accurately on one PC cluster, we compared them with KKL aerial orthophoto (high spatial resolution photography). Thus, we used two screens in parallel, one on which the aerial photograph was displayed, and on the other, the different PC clusters were displayed. This comparison allowed an accurate identification that a number of certain PC clusters are actually one PC cluster, representing the same canopy of a particular tree. the various PC clusters are consolidated accurately on one PC cluster, we compared them with KKL aerial orthophoto (high spatial resolution photography). Thus, we used two screens in parallel, one on which the aerial photograph was displayed, and on the other, the different PC clusters were displayed. This comparison allowed an accurate identification that a number of certain PC clusters are actually one PC cluster, representing the same canopy of a particular tree. This cluster was then masked out from the image before the next iteration using ENVI's binary mask option.

Validation Process
Accuracy assessment of the classified image is an important stage in all image classification procedures. The error matrix is the most frequently used method for accuracy assessment [63][64] as detailed in previous literature [64][65][66][67][68][69][70]. One critical step is to select a sufficient and representative number of ground-trough points for each class with a suitable sampling method. Random or stratified sampling approaches are often used for a robust assessment of classification results, as the collection of test samples based on a pure random sampling technique was impossible due to steep topography and high forest density. We executed a validation protocol similar to those used in similar remote areas [63]. To that end, plots sizing from 6000 to 14,000 square meters were defined manually using ArcView GIS software (ESRI Inc.) according to the orthophoto and have been selected in different reachable locations within the image. At each plot, we manually selected random points which were used as validation target points. The survey was conducted along transects extending from the closest road to the selected point, where all individual plants that were large enough to cover at least one image pixel and that constituted a single species (i.e., did not overlap with other species) were sampled. This resulted in different densities of surveyed individuals (N = 257), depending on the local structure of the vegetation (Figure 1). We regard this routine as a semi-random selection as was carried out by [63].
Each surveyed ground-truth validation point was located by GPS coordinates and marked on a high-spatial-resolution orthophoto. Each tree was botanically identified and measured for its height; in addition, trunk and canopy diameters were taken. The validation survey provided a database consisting of 30 transacts of 50 meters length. Overall, 257 plant individuals were overlain on the This cluster was then masked out from the image before the next iteration using ENVI's binary mask option.

Validation Process
Accuracy assessment of the classified image is an important stage in all image classification procedures. The error matrix is the most frequently used method for accuracy assessment [63,64] as detailed in previous literature [64][65][66][67][68][69][70]. One critical step is to select a sufficient and representative number of ground-trough points for each class with a suitable sampling method. Random or stratified sampling approaches are often used for a robust assessment of classification results, as the collection of test samples based on a pure random sampling technique was impossible due to steep topography and high forest density. We executed a validation protocol similar to those used in similar remote areas [63]. To that end, plots sizing from 6000 to 14,000 square meters were defined manually using ArcView GIS software (ESRI Inc.) according to the orthophoto and have been selected in different reachable locations within the image. At each plot, we manually selected random points which were used as validation target points. The survey was conducted along transects extending from the closest road to the selected point, where all individual plants that were large enough to cover at least one image pixel and that constituted a single species (i.e., did not overlap with other species) were sampled. This resulted in different densities of surveyed individuals (N = 257), depending on the local structure of the vegetation (Figure 1). We regard this routine as a semi-random selection as was carried out by [63].
Each surveyed ground-truth validation point was located by GPS coordinates and marked on a high-spatial-resolution orthophoto. Each tree was botanically identified and measured for its height; in addition, trunk and canopy diameters were taken. The validation survey provided a database consisting of 30 transacts of 50 meters length. Overall, 257 plant individuals were overlain on the classified images for the classification performance examination stage. The same ground truth points were used to evaluate classified images from PCBC, K-means and ISODATA classifiers and the corresponding error matrix were constructed independently. The producer's accuracy, user's accuracy and overall accuracy were calculated according to the error matrices [66].

K-Means and ISODATA Results
To quantitatively assess the results of the K-means and ISODATA classifiers, we conducted a validation process based on the ground-truth points collected in the field. The classes were not well-defined for either K-means or ISODATA and were mixed together with non-vegetation pixels. In general, the K-means classifier identified vegetation and non-vegetation pixels. Among the vegetation pixels, the classifier identified only two species: P. halepensis and oak trees (including both Q. calliprinos and Q. ithaburensis as one class) (Figure 6a).
The ISODATA classifier differentiated more species than the K-means classifier. The identified species were: P. halepensis, Q. calliprinos and Q. ithaburensis (a separate class for each), shrubs, and lianas. Moreover, the roads were identified as one class. Figure 6 presents subscenes of the results of these classifiers. Tables 2 and 3 present the validation results of the K-means and ISODATA classifiers, respectively, for the Components image, which was produced based on the Specim AisaFENIX reflectance image. As an initial stage, we assessed the distribution of plant species in the field site based on prior knowledge of existing plant species at the research site and assigned each automatically-classified category to a specific species or a general type of vegetation (e.g., "shrubs"). Since we used an automatic classification method, i.e., that is not based on data for calibration, we used all the ground-truthed data (N = 257) to validate the results of the K-means and ISODATA. To estimate the fit between the defined categories and plant identities, as mentioned before, each identified cluster represented crowns of similar plant species. Although the field survey included tree crown measurements, it was challenging to estimate the exact number of pixels representing each crown. Therefore, at the validation stage, in the case of a defined crown cluster covering the ground-truth validation point, it was considered an individual plant species/type rather than evaluated for its number of pixels. Hence, the confusion matrix column shown in Tables 2 and 3: "Number of classified points in image" represents K-means and ISODATA class clusters, respectively, appearing at the location of the ground-truth points, and the row: "Number of ground-data points" represents the number of individual plants of each species/type identified in the field survey. This type of matrix is widely used in remote-sensing studies to validate classification results [26,27].
As can be seen in Tables 2 and 3, only the P. halepensis class was identified with 100% producer and user accuracies by both K-means and ISODATA. The classes for other species were identified with lower accuracies. Classes detected by the ISODATA classifier were identified with higher accuracy. However, the overall accuracy of the K-means classifier was higher than the overall accuracy of the ISODATA classifier as it was calculated for a smaller number of classes. Moreover, both classifiers classified the roads in the research site as one cluster.

PCABC Results
Compared to the K-means and ISODATA classifiers, the PCABC resulted in the detection of six independent classes of vegetation in the studied Mediterranean forest (Figure 7). The classes appeared well-defined and spread across the image, aligned with the crowns of trees and shrubs ( Figure 7). As in most unsupervised classification methodologies with no preliminary input data aside from the image itself, the classes clustered by PCABC must be identified and recognized. As mentioned before, an initial stage was conducted. In this stage, we assessed the distribution of plant species in the field site based on prior knowledge of existing plat species at the research site and assigned each automatically-classified category to a specific species or a general type of vegetation (e.g., "shrubs"). Since we used an automatic classification method, i.e., that is not based on data for calibration, we used all the ground-truthed data to validate the results of the automatic classification and to estimate the fit between the defined categories (PCABC classes) and plant identities. Classes and plant species were identified based on the comprehensive field survey.
Overall, the six vegetation classes were identified as: P. halepensis (Figure 7a Table 4 presents the validation results of the PCABC. The validation process was performed in the same way as for the K-means and ISODATA classifiers.  Table 4 presents the validation results of the PCABC. The validation process was performed in the same way as for the K-means and ISODATA classifiers.    Table 4 presents all six identified classes as well as some unidentified pixels ("No class"). All six plant classes were detected with high accuracy. Note that the classes of P. halepensis, lianas, P. lentiscus, and Q. ithaburensis were detected with a phenomenal user accuracy of 100%. The high producer accuracy and the high Kappa coefficient indicate that very few misclassifications were observed for all classes.
An examination of the total classified forest area showed that approximately 41% of the research site was classified as non-vegetation and approximately 53% was classified as plant species. Only 6% of the research site remained unclassified after the completion of four PCABC iterations. Unclassified  Table 4 presents all six identified classes as well as some unidentified pixels ("No class"). All six plant classes were detected with high accuracy. Note that the classes of P. halepensis, lianas, P. lentiscus, and Q. ithaburensis were detected with a phenomenal user accuracy of 100%. The high producer accuracy and the high Kappa coefficient indicate that very few misclassifications were observed for all classes.
An examination of the total classified forest area showed that approximately 41% of the research site was classified as non-vegetation and approximately 53% was classified as plant species. Only 6% of the research site remained unclassified after the completion of four PCABC iterations. Unclassified data mostly included canopy marginal areas of the identified plant species. Figure 8 presents a thematic map excluding the non-vegetation pixels and encompassing all detected classifications.
Correspondence of the classes with the ground-truth points was high. For instance, classification by the PCABC method succeeded in detecting short trees (<2 m) with a canopy diameter as small as 1 m in the dense forest. Figure 9 presents an example of a P. halepensis sapling (height 1.5 m, canopy diameter 1 m), which was identified by the PCABC method growing within a much larger patch of shrubs. Moreover, the detection of lianas (a collection of three to five different species) growing on top of different tree canopies is not trivial due to the potential for mixed pixels. Figure 10 presents an example of a ground-truth validation point showing correct identification of a tree canopy covered by lianas.
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 19 data mostly included canopy marginal areas of the identified plant species. Figure 8 presents a thematic map excluding the non-vegetation pixels and encompassing all detected classifications. Correspondence of the classes with the ground-truth points was high. For instance, classification by the PCABC method succeeded in detecting short trees (<2 m) with a canopy diameter as small as 1 m in the dense forest. Figure 9 presents an example of a P. halepensis sapling (height 1.5 m, canopy diameter 1 m), which was identified by the PCABC method growing within a much larger patch of shrubs. Moreover, the detection of lianas (a collection of three to five different species) growing on top of different tree canopies is not trivial due to the potential for mixed pixels. Figure 10 presents an example of a ground-truth validation point showing correct identification of a tree canopy covered by lianas.  Correspondence of the classes with the ground-truth points was high. For instance, classification by the PCABC method succeeded in detecting short trees (<2 m) with a canopy diameter as small as 1 m in the dense forest. Figure 9 presents an example of a P. halepensis sapling (height 1.5 m, canopy diameter 1 m), which was identified by the PCABC method growing within a much larger patch of shrubs. Moreover, the detection of lianas (a collection of three to five different species) growing on top of different tree canopies is not trivial due to the potential for mixed pixels. Figure 10 presents an example of a ground-truth validation point showing correct identification of a tree canopy covered by lianas.

Discussion
The PCABC methodology presented here was formulated to enable improved unsupervised classification of plant species and types in a Mediterranean forest. Figure 11a presents the mean spectra of all six PCABC-identified classes in the Specim AisaFENIX image. Most plant classes contained similar spectral features, with minor variations. Nonetheless, PCABC provided highly improved classification compared to the traditional unsupervised classification methodologies, resulting in the identification of all six classes of Mediterranean trees and shrubs with an overall accuracy of 91%. Figure 11b presents the mean spectra (full lines) ± one standard deviation spectra (dashed lines) calculated for Q. ithaburensis (light blue lines) and Q. calliprinos (dark blue lines). The plots of the spectra ± one standard deviation of these oak tree species are one above the other, indicating that the difference between these two dominant Mediterranean oak species is minor and mainly related to the overall albedo. This means that the spectral features are well defined and almost identical. In fact, one of the main biological differences between the two oaks is that Q. ithaburensis (upper curve) is winter deciduous with a thinner cuticle and hairy trichomes on the leaf surface, whereas Q. calliprinos (lower curve) is evergreen, having tough sclerophyllous leaves with a thick cuticle and no epidermal hairs. Moreover, as illustrated in Figure 11c, the ratio between the mean spectra of each of these classes was almost a straight line, with values of above 0.8 (in gray). Nevertheless, the two oak species were identified by PCABC with high user accuracies of 92% and 100%, respectively. Since both oak species are dominant in the studied forest, the ability to separate them is essential.
The class of shrubs was also identified with a relatively high user accuracy of 83%. Misclassification in this class was mostly related to it being a collection of up to six different shrub species with a wider range of spectral features compared to classes representing single species; therefore, they might resemble the spectral features of other classes ( Figure 11).

Discussion
The PCABC methodology presented here was formulated to enable improved unsupervised classification of plant species and types in a Mediterranean forest. Figure 11a  It should be noted that the accuracy of the classification was tested for identification of particular tree species rather than the entire canopy of each plant individual, which means that parts of the canopy may not have been identified as part of the classified tree due to shading or relief. However, as long as most of the visible canopy of a tree species (appearing in the high-resolution orthophoto) was classified, it was considered to be a single tree belonging to that class.
Unidentified pixels and misclassification may be attributed to two main sources. The first is an enhanced bidirectional reflectance distribution function (BRDF). In this case, because the acquisition of the image was at 06:55 UTC, the direction of the sun, and therefore the BRDF effects, might explain why our PCABC method did not relate all pixels to one of the six classes. The second source of error might be related to the manual clustering applied in the PCABC methodology. As noted, this methodology is based on a binary categorization by the ENVI DS tool (Research Systems Inc. 1999),  Figure 11b presents the mean spectra (full lines) ± one standard deviation spectra (dashed lines) calculated for Q. ithaburensis (light blue lines) and Q. calliprinos (dark blue lines). The plots of the spectra ± one standard deviation of these oak tree species are one above the other, indicating that the difference between these two dominant Mediterranean oak species is minor and mainly related to the overall albedo. This means that the spectral features are well defined and almost identical. In fact, one of the main biological differences between the two oaks is that Q. ithaburensis (upper curve) is winter deciduous with a thinner cuticle and hairy trichomes on the leaf surface, whereas Q. calliprinos (lower curve) is evergreen, having tough sclerophyllous leaves with a thick cuticle and no epidermal hairs. Moreover, as illustrated in Figure 11c, the ratio between the mean spectra of each of these classes was almost a straight line, with values of above 0.8 (in gray). Nevertheless, the two oak species were identified by PCABC with high user accuracies of 92% and 100%, respectively. Since both oak species are dominant in the studied forest, the ability to separate them is essential.
The class of shrubs was also identified with a relatively high user accuracy of 83%. Misclassification in this class was mostly related to it being a collection of up to six different shrub species with a wider range of spectral features compared to classes representing single species; therefore, they might resemble the spectral features of other classes ( Figure 11).
It should be noted that the accuracy of the classification was tested for identification of particular tree species rather than the entire canopy of each plant individual, which means that parts of the canopy may not have been identified as part of the classified tree due to shading or relief. However, as long as most of the visible canopy of a tree species (appearing in the high-resolution orthophoto) was classified, it was considered to be a single tree belonging to that class.
Unidentified pixels and misclassification may be attributed to two main sources. The first is an enhanced bidirectional reflectance distribution function (BRDF). In this case, because the acquisition of the image was at 06:55 UTC, the direction of the sun, and therefore the BRDF effects, might explain why our PCABC method did not relate all pixels to one of the six classes. The second source of error might be related to the manual clustering applied in the PCABC methodology. As noted, this methodology is based on a binary categorization by the ENVI DS tool (Research Systems Inc. 1999), where the value of the ranges is statistically evaluated. Therefore, some of the pixels may not have been correctly assigned. One of the common difficulties in both supervised and unsupervised classifications is the spectral similarity between classes, especially of vegetation [26,27,49,52,71].
Previous studies have found, in other ecosystems, that the abundance of lianas and vines has a negative impact on the aboveground biomass of forest trees [72][73][74]. This may be true for Mediterranean forests as well, and if so, may be useful as a predictor of changes in tree health and biomass, thereby contributing to forest management [75]. We believe that the PCABC methodology will contribute to improved monitoring, management, and conservation of Mediterranean forests, and that future use of this methodology will expand its performance to other fields as well.

Summary and Conclusions
PCABC is primarily an image-based methodology with no reference input. As such, it may be considered unsupervised. The methodology is designed to overcome difficulties faced by traditional classifiers, such as the high dimensionality of hyperspectral data and the high correlation of adjacent band data [26,27,71,76]. Commonly-used unsupervised classification methodologies, including those tested here, are mostly based on class-comparative statistics (e.g., mean and standard deviation or error) and spectral-based numerical calculations [49,52,63]. These were not sensitive enough for this study. In contrast, PCABC results reached an overall accuracy of 91% and as high as 100% accuracy for most classes. The methodology displayed high efficiency at recognizing all major plant species of the Mediterranean forest as well as distinguishing between different species of the same genus having nearly similar spectral features. Moreover, trees with crowns under 2 m in diameter and liana growing on top of different tree canopies were identified with high accuracy. To the best of our knowledge, such a detailed and accurate classification has never been achieved for diverse sclerophyllous Mediterranean forests by supervised or unsupervised classification. This ability was attributed to both the high spatial resolution of the Specim AisaFENIX data and the high performance of the PCABC.
The ability to detect subtle spectral differences between similar plant species or types and classify them clearly demonstrates PCABC's ability to overcome the known disadvantages of previous PCA-based methodologies, such as low sensitivity to subtle differences in the target reference as well as inability to detect small classes [25,36]. This is attributed to the advantages of PCA for unsupervised variability detection in HRS enhanced by the sequential processing. The scheme presented here utilizes known practices, such as PCA, but differs from previous classification-intended research in both concept and practice. First, PCA was not used in this procedure as a preprocessing stage to reduce data size prior to classification, but as a tool for reparability classification. Therefore, no inverse PCA was applied. Secondly, no spectral reduction was applied. To deal with the large HRS dataset, we simultaneously reduced the size of the data and classified it by spatial removal of the detected clustered areas rather than excluding bands from the original dataset. As no inverse transformation and no spectral reduction were applied, there was no loss of spectral data during the procedure. Furthermore, we used PCABC in an iterative procedure in which PCA is run on all of the original dataset bands at every iteration. Applying PCA to a smaller spatial subset of the image at every consecutive iteration reduced the variability compared to the original dataset, and therefore, increased the correlation detection at each iteration with less variables, while using all spectral data (bands) for the classification. In contrast to the conventional methodologies that were tested, PCABC results reached an overall accuracy of 91% and as high as 100% accuracy.
Due to its improved classification results, PCABC shows promise as a tool for unsupervised classification, enabling the mapping of diverse Mediterranean forests and potentially monitoring changes in the composition and structure of these dense forests, woodlands and shrublands. As noted, the PCABC methodology requires user-defined clustering for each iteration and is not yet fully automated. Therefore, both binary thresholding and the subjectivity of the applied manual clustering may result in misclassifications. The methodology has yet to be tested with other sensors (e.g., multispectral), backgrounds (e.g., urban, tropical forests, water bodies) or with subsequent datasets, and therefore, its further application to different forests is strongly warranted.

Funding:
The study received no financial funding.