Genus-Physiognomy-Ecosystem (GPE) System for Satellite-Based Classiﬁcation of Plant Communities

: Vegetation mapping and monitoring is important as the composition and distribution of vegetation has been greatly inﬂuenced by land use change and the interaction of land use change and climate change. The purpose of vegetation mapping is to discover the extent and distribution of plant communities within a geographical area of interest. The paper introduces the Genus-Physiognomy-Ecosystem (GPE) system for the organization of plant communities from the perspective of satellite remote sensing. It was conceived for broadscale operational vegetation mapping by organizing plant communities according to shared genus and physiognomy/ecosystem inferences, and it offers an intermediate level between the physiognomy/ecosystem and dominant species for the organization of plant communities. A machine learning and cross-validation approach was employed by utilizing multi-temporal Landsat 8 satellite images on a regional scale for the classiﬁcation of plant communities at three hierarchical levels: (i) physiognomy, (ii) GPE, and (iii) dominant species. The classiﬁcation at the dominant species level showed many misclassiﬁcations and undermined its application for broadscale operational mapping, whereas the GPE system was able to lessen the complexities associated with the dominant species level classiﬁcation while still being capable of distinguishing a wider variety of plant communities. The GPE system therefore provides an easy-to-understand approach for the operational mapping of plant communities, particularly on a broad scale.


Introduction
Plant communities are distinguishable patches of plant species formed within an area through the interaction of biotic and abiotic factors [1][2][3]. The distribution and composition of plant communities has been greatly influenced by land use history [4,5], and the interaction of land use change and climate change has much intensified the impacts on plant communities [6,7]. The organization and monitoring of plant communities is necessary to better understand the mechanisms of vegetation changes in response to global change.
Traditionally, vegetation maps of several parts of the world have been produced by manual delineation of the occurrence and distribution of vegetation types into cartographic or geographic environments [8]. This procedure has been facilitated by the visual interpretation of aerial or satellite images [9,10]. Mapping of vegetation types in modern days involves the numerical analysis of environmental data such as temperature, precipitation, and geology and/or the classification of aerial or satellite images [11][12][13]. Machine learning of remote sensing images with a small set of ground truth data and the construction of a model to predict unseen data has been a common practice for producing vegetation maps. Researchers have tried many sorts of remote sensing images, multi-spectral and hyperspectral images obtained from satellites or aircrafts, for the classification and mapping of vegetation types [14][15][16]. A number of machine learning classifiers, support vector machines, random forests, and neural networks have been employed for this purpose [17][18][19][20].
The organization and mapping of plant communities is necessary to inform the occurrence and distribution of vegetation types in a region of interest [21,22].
Vegetation mapping involves two procedures: (i) organization of plant communities from a biogeographical or ecological point of view and (ii) delineation or mapping of plant communities into cartographic or geographic environments. The word classification is reserved for classifying the satellite images, and word organization is used for the classification of plant communities. Intensive field surveys have been done by pioneer researchers to identify and organize vegetation types in different parts of the world [23][24][25]. Some typical systems for the organization of vegetation types are summarized as follows: (i) Bioclimate [26,27]: This is mainly the effects of temperature and precipitation. For example, tropical rain forests, boreal forests, Arctic meadows, etc.; (ii) Ecosystem [28]: Associated ecological significance, such as alpine herbaceous, wetland herbaceous, etc.; (iii) Physiognomy [29][30][31] From the perspective of satellite remote sensing, among these five typical systems for the organization of plant communities (bioclimate, ecosystem, physiognomy, phytosociological association, and dominant species), the phytosociological association system, which is based on characteristic species rather than dominant species, is different because the dominant species mostly determine the measured physical signals. In addition, the bioclimatic variables are not relevant to the mapping of vegetation types at finer spatial resolutions as they are available at coarse spatial resolutions. Physiognomy and ecosystems are quite higher levels that cannot inform about the detailed composition of vegetation types. Dominant species should be a final level of vegetation mapping. However, particularly on a broad scale, the enumeration of hundreds of dominant species is a cumbersome procedure, and the classification of satellite images for many classes is very challenging. Therefore, to cope with these limitations, an intermediate level, namely Genus-Physiognomy-Ecosystem (GPE), between the physiognomy/ecosystem and dominant species has been introduced in the research for organizing plant communities. This paper assesses the potential of the GPE system for the classification of plant communities by employing machine learning techniques on the multi-temporal Landsat 8 satellite images. The possible advantages of the GPE system for the broadscale operational mapping of plant communities are also discussed in this paper.

Study Area
This research was conducted in the Tohoku region of Japan which is located in a cool temperate zone. The location of the study area is shown in Figure 1.

Enumeration of Dominant Plants
In Japan, plant communities have been surveyed at a national scale based on phytosociological units. With reference to existing field survey data, 126 dominant plant species were enumerated in the study area as shown in Table 1.

Enumeration of Dominant Plants
In Japan, plant communities have been surveyed at a national scale based on phytosociological units. With reference to existing field survey data, 126 dominant plant species were enumerated in the study area as shown in Table 1. Geolocations (longitudes and latitudes) of the dominant species were prepared with reference to existing survey data, visual interpretation of time-lapse images available in Google Earth, and confirmation with field observations. For each dominant species, 30-90 sample points were collected as the ground truth data from a homogenous area of at least 30 × 30 m and were distributed throughout the study area. The distribution of the ground truth data in the Tohoku region is shown in Figure 2. Geolocations (longitudes and latitudes) of the dominant species were prepared with reference to existing survey data, visual interpretation of time-lapse images available in Google Earth, and confirmation with field observations. For each dominant species, 30-90 sample points were collected as the ground truth data from a homogenous area of at least 30 × 30 m and were distributed throughout the study area. The distribution of the ground truth data in the Tohoku region is shown in Figure 2.

Genus-Physiognomy-Ecosystem (GPE) System
On the basis of field observations in different locations in the Tohoku region, the Genus-Physiognomy-Ecosystem (GPE) system has been conceived by introducing the genus

Genus-Physiognomy-Ecosystem (GPE) System
On the basis of field observations in different locations in the Tohoku region, the Genus-Physiognomy-Ecosystem (GPE) system has been conceived by introducing the genus and physiognomy/ecosystem inferences on the dominant species. The GPE is an intermediate level between the physiognomy/ecosystem and the dominant species for organizing plant communities. The GPE system is more detailed than the physiognomy/ecosystem level but simpler and more practical than the dominant species level. Table 2 describes the implementation of genus and physiognomy/ecosystem inferences on the dominant species.  Still, the GPE system is capable of distinguishing a wider variety of plant (ecological) communities, such as Quercus Evergreen Broadleaf Forest (EBF) (subtropical to warm temperate), Quercus Deciduous Broadleaf Forest (DBF) (cool temperate), and Quercus Shrub (alpine). Furthermore, the GPE system offers several benefits over the dominant species system. For example, the same Quercus crispula in the dominant species system can be organized into two different communities (Quercus DBF and Quercus Shrub) under the GPE system. The Quercus Shrub, namely Miyamanara (in Japanese), is a noticeable shrub community in the alpine region.
The GPE system included all dominant tree genera (deciduous/evergreen and conifer/ broadleaf). However, only shrub and herbaceous genera, prominent in large patches, were organized separately, such as Rhododendron Shrub and Miscanthus Herb. It seems complicated to identify prominent patches of all shrub and herbaceous genera and more difficult to discriminate them from the satellite images. Therefore, shrub and herbaceous communities were mostly organized with physiognomy/ecosystem inferences rather than the genus inference. Interestingly, physiognomy/ecosystem inferences are more relevant to the shrub and herbaceous communities from the viewpoint of ecological and conservation significance, such as Wetland Herb and Alpine Herb.

Processing of Landsat 8 Data
Landsat 8 data offer optical imagery in visible, near infrared, and shortwave and thermal infrared wavelengths. Standard terrain corrected (Level 1T) Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) scenes available from 2017 to 2019 over the Tohoku region of Japan were utilized. The Digital Numbers (DNs) delivered as 16-bit unsigned integers were calibrated into Top-Of-Atmosphere (TOA) spectral reflectance and brightness temperature (K) values for the OLI and TIRS respectively using the rescaling coefficients available in the metadata file. The clouds were removed by using separate Quality Assessment (QA) band information. Seven spectral bands (blue, green, red, near infrared, mid infrared, shortwave infrared, and thermal infrared) were extracted. The spectral bands were composited by calculating monthly median values pixel by pixel. In this way, 84 features (12 months × 7 bands) were generated for machine learning and classification.

Machine Learning and Cross-Validation
The potential of satellite remote sensing data for the classification of plant communities at three hierarchical levels-(i) physiognomy, (ii) Genus-Physiognomy-Ecosystem (GPE), and (iii) dominant species-was evaluated by employing a machine learning technique. The pixel values, corresponding to the ground truth (geolocation points) data, were extracted for the dominant species organization of plant communities. The ground truth data were merged for other organizations of plant communities. The number of ground truth data available for dominant species organization varied from 30 to 90 for each class. However, much larger numbers of ground truth data were available for the classification at higher levels (physiognomy and GPE) because they were supersets of the lower levels. The random forests classifier was employed for the classification of satellite images with the support of ground truth data. The classification performance was assessed by utilizing a 10-fold cross-validation method. The parameters of the model were fine-tuned with reference to the cross-validation accuracy. Accuracy metrics such as kappa coefficient and f1-score were utilized for the quantitative evaluation.

Classification at Physiognomy Level
The confusion matrix computed with the 10-fold cross-validation approach is shown in Figure 3. The classification of plant communities at the physiognomy level showed higher accuracy in terms of the kappa coefficient (0.834) and f1-score (0.879).
Ecologies 2021, 2, The class-wise accuracy of the physiognomy level classification of plant communities is summarized in Table 3. For the class-wise accuracy, the kappa coefficients varied from 0.613 to 0.912, and the f1-scores varied from 0.615 to 0.904.

Classification at Dominant Species Level
The classification of dominant species (126 classes) with the 10-fold cross-validation approach showed both a kappa coefficient and f1-score of 0.820. The overall accuracy is slightly lower than the physiognomy level classification. However, the class-wise accuracy analysis (Table 4) showed poor performance of the dominant species level classification for many dominant plants. Out of 126 dominant species, 45 species showed accuracy (kappa) lower than 80%, while nine species showed accuracy (kappa) lower than 60%. The classification of plant communities at the dominant species level introduced many misclassifications and undermined its application for the operational mapping of plant communities.  The class-wise accuracy of the physiognomy level classification of plant communities is summarized in Table 3. For the class-wise accuracy, the kappa coefficients varied from 0.613 to 0.912, and the f1-scores varied from 0.615 to 0.904.

Classification at Dominant Species Level
The classification of dominant species (126 classes) with the 10-fold cross-validation approach showed both a kappa coefficient and f1-score of 0.820. The overall accuracy is slightly lower than the physiognomy level classification. However, the class-wise accuracy analysis (Table 4) showed poor performance of the dominant species level classification for many dominant plants. Out of 126 dominant species, 45 species showed accuracy (kappa) lower than 80%, while nine species showed accuracy (kappa) lower than 60%. The classification of plant communities at the dominant species level introduced many misclassifications and undermined its application for the operational mapping of plant communities.  Table 5 shows the class-wise performance of the Genus-Physiognomy-Ecosystem (GPE) classes introduced in the research. Besides two classes (Pterocarya DBF and Carex Herb), at least 71% accuracy (kappa) was obtained for all GPE classes. Moreover, only five classes among 51 GPE classes showed accuracy (kappa) lower than 80%. The overall accuracy (kappa = 0.872; f1-score = 0.877) was also higher than that of dominant species level classification (kappa = 0.820; f1-score = 0.820).

Classification at GPE Level
Among the classification of six Acer species (Table 4, rows 5-10), only three species were classified with more than 80% accuracy, whereas the accuracy was very low (ranging from 0.370 to 0.747) for the other three species. These Acer species exhibit similar phenology and are difficult to discriminate separately from the spectral reflectance. When they are merged by the inference of common genus, an accuracy of 0.891 was achieved for Acer DBF (Table 5, row 2). A similar trend was obtained in almost all cases. The class-wise performance sounds important for operational mapping. The merging of similar species by the inference of genus also improved the classification of other genera with single species. For example, the average classification performance of Juglans mandshurica and Fraxinus mandshurica (Table 4) was 0.748, whereas Juglans DBF and Fraxinus DBF (Table 5) showed an average performance of 0.83.

Discussion
The collection of ground truth data for the supervised classification of dominant plant species is time-consuming and expensive. Plant species are mixed at the community level and finding a homogenous community of dominant species for use as the ground truth data is difficult. On the other hand, classification accuracy generally reduces with the large number of similar classes involved. Given the satellite features, the merging of similar classes can improve the performance, while the machine learning classifier cannot discern similar classes. To lessen the complexities associated with the mapping of plant communities at the dominant species level, the Genus-Physiognomy-Ecosystem (GPE) system was conceived in the research to organize the plant communities according to genus and physiognomy/ecosystem inferences.
Satellite remote sensing deals with the spectral reflectance obtained from the whole land surface, which is composed of vegetative, non-vegetative, topographic, and climatic characteristics. It should be easier to discriminate between two similar species (for example, Fagus crenata and Fagus japonica) located in different places rather than similar species located nearby, as satellite-based signals are also affected by topographic and climatic variations. Therefore, higher classification accuracies of plant communities obtained in the research might not be met during operation mapping, which has to deal with the discrimination of nearby pixels. Moreover, operational mapping involves the application of the machine learning model tuned with the given set of ground truth data to unseen new data, which may reduce the performance further.
It should be noted that the purpose of this research was not merely to extract (merge) similar classes/species as dictated by higher (lower) performance of machine learning classifiers. The overarching purpose of vegetation mapping is to discover the extent and distribution of plant communities within a geographical area of interest to meet conservation and management goals. Therefore, the GPE system involves the organization of plant com-munities according to ecological and conservation significance by introducing appropriate genus and physiognomy/ecosystem inferences. However, it is a flexible system in that the classes can be increased further in the future by expanding/shrinking genus/ecosystem inferences when the characteristics (spatial, spectral, and temporal resolutions) of satellite imagery and machine learning techniques advance along with the capability of improved classification and mapping. Nationwide mapping of plant communities on the basis of the GPE system is our plan for the immediate future.

Conclusions
This research evaluated the potential of satellite remote sensing data for the classification of plant communities at three hierarchical levels (physiognomy, Genus-Physiognomy-Ecosystem, and dominant species). The classification of 126 dominant plant species with multi-temporal satellite data in the Tohoku region of Japan showed that 45 dominant species could not be classified with accuracy (kappa) higher than 80%, while nine dominant species showed accuracy (kappa) lower than 60%. However, the GPE organization of plant communities, newly introduced in the research, showed that at least 71% accuracy (kappa) was obtained for all GPE classes besides two classes, while only 5 out of 51 classes showed accuracy (kappa) lower than 80%. The overall accuracy (kappa = 0.872; f1-score = 0.877) was also higher than that of the dominant species level classification (kappa = 0.820; f1-score = 0.820). The GPE system therefore provides a practical and easyto-understand approach for the operational mapping and monitoring of plant communities, particularly on a broad scale.