Using Drones to Monitor Broad-Leaved Orchids ( Dactylorhiza majalis) in High-Nature-Value Grassland

: Dactylorhiza majalis is a threatened indicator species for the habitat quality of nutrient-poor grassland sites. Environmentalists utilize the species to validate the success of conservation efforts. Conventionally, plant surveys are ﬁeld campaigns where the plant numbers are estimated and their spatial distribution is either approximated by GPS or labor-intensively measured by differential GPS. In this study, we propose a monitoring approach using multispectral drone-based data with a very high spatial resolution (~3 cm). We developed the magenta vegetation index to enhance the spectral response of Dactylorhiza majalis in the drone data. We integrated the magenta vegetation index in a random forest classiﬁcation routine among other vegetation indices and analyzed feature impact on model decision making using SHAP. We applied an image object-level median ﬁlter to the classiﬁcation result to account for image artefacts. Finally, we aggregated the ﬁltered result to individuals per square meter using an overlaying vector grid. The SHAP analysis showed that magenta vegetation index had the highest impact on model decision making. The random forest model could reliably classify Dactylorhiza majalis in the drone data (F1 score: 0.99). We validated the drone-derived plant count using ﬁeld mappings and achieved good results with an RMSE of 12 individuals per square meter, which is within the error margin stated by experts for a conventional plant survey. In addition to abundance, we revealed the comprehensive spatial distribution of the plants. The results indicate that drone surveys are a suitable alternative to conventional monitoring because they can aid in evaluating conservation efforts and optimizing site-speciﬁc management. evaluated feature importance and interactions; we further removed redundant or nonpredictive features from the training dataset, retrained the model with the best-performing set of features, and applied the retrained classiﬁer to the drone dataset before we assessed the accuracy of the classiﬁcation results. In the following aggregation, we polygonized classiﬁed pixel clusters (i.e., pixels of the same class in direct proximity) to vector image objects and applied an object-level ﬁltering approach to remove invalid pixels from the subsequent inﬂorescence count. We assessed the accuracy of the remote-sensing-derived plant count by comparison with our ﬁeld mapping results. In a ﬁnal step, the resulting inﬂorescences were summed up to individuals per square meter in a Universal Transverse Mercator coordinate system.


Introduction
Dactylorhiza majalis (DM) or broad-leaved marsh orchid is an indicator species for the habitat quality of nutrient-poor grassland sites. The species is widespread across Europe, and a significant portion of the population grows in Germany [1,2]. However, DM used to be far more common in Germany's grasslands in the 1950s. Conservationists have reported a strong decline in the national population for at least the past 20 years [1, 3,4]. Due to the severe decline, the federal agency of nature conservation and all state agencies of nature conservation in Germany list DM on the Red List of species threatened with extinction [2]. Furthermore, central European countries have reported a similar decline in DM population, e.g., the Czech Republic [5] or Switzerland [6]. Likewise, they list the species in their national or regional lists of species threatened with extinction.
The causes of decline are generally agreed upon; intensification of agriculture, water drainage, scrub encroachment due to habitat abandonment, and forestation negatively impact the habitat conditions [1][2][3][4][5]. Conservationists have tried counteracting the population decline by restoring the species' original habitat conditions, but the success of the conservation measures remains to be seen. To support the conservation measures, regular monitoring of the population development is required. Conventional mapping approaches

Study Site
We conducted this study in the Lehmkuhlen reservoir (Figure 1), an alkaline, nutrientpoor fen in the uplands of Schleswig-Holstein, Germany, which reported the biggest statewide DM population [22]. As a part of the uplands of Schleswig-Holstein, the Lehmkuhlen reservoir was formed during the Weichselian ice age [23]. At that time, the area originated as a lake but transitioned into a fen over time. Since the 1950s, parts of the Lehmkuhlen reservoir transformed into a forest fen. With an area of 0.29 km 2 , it is a small but species-rich area, where 60 plant species threatened with extinction including DM are reported [22,23]. The Lehmkuhlen reservoir, therefore, is protected by the European habitat directive due to the occurrence of alkaline fens (Nature 2000 code: FFH 7230) and transition mires (Nature 2000 code: FFH 7140) [24]. Due to the ecological importance of the area, conservation measures are conducted to preserve the rare habitat conditions [24]. In 2011, parts of the forest were removed to increase the development potential of endangered plant species. To prevent scrub encroachment in the open fen and retain favorable conditions for competitively weak plant species, conservationists mow the area once a year.

Study Site
We conducted this study in the Lehmkuhlen reservoir (Figure 1), an alkaline, nutrient-poor fen in the uplands of Schleswig-Holstein, Germany, which reported the biggest state-wide DM population [22]. As a part of the uplands of Schleswig-Holstein, the Lehmkuhlen reservoir was formed during the Weichselian ice age [23]. At that time, the area originated as a lake but transitioned into a fen over time. Since the 1950s, parts of the Lehmkuhlen reservoir transformed into a forest fen. With an area of 0.29 km 2 , it is a small but species-rich area, where 60 plant species threatened with extinction including DM are reported [22,23]. The Lehmkuhlen reservoir, therefore, is protected by the European habitat directive due to the occurrence of alkaline fens (Nature 2000 code: FFH 7230) and transition mires (Nature 2000 code: FFH 7140) [24]. Due to the ecological importance of the area, conservation measures are conducted to preserve the rare habitat conditions [24]. In 2011, parts of the forest were removed to increase the development potential of endangered plant species. To prevent scrub encroachment in the open fen and retain favorable conditions for competitively weak plant species, conservationists mow the area once a year.   Figure 2 illustrates the methodological workflow of this study, which is divided into data acquisition, preprocessing, processing, aggregation, and validation. Data acquisition includes drone flight and field mapping. The preprocessing covers exploratory data analysis, subsequent feature engineering, and the creation of a reference dataset for model training and validation. During processing, we trained a random forest classifier and evaluated feature importance and interactions; we further removed redundant or nonpredictive features from the training dataset, retrained the model with the best-performing set of features, and applied the retrained classifier to the drone dataset before we assessed the accuracy of the classification results. In the following aggregation, we polygonized classified pixel clusters (i.e., pixels of the same class in direct proximity) to vector image objects and applied an object-level filtering approach to remove invalid pixels from the subsequent inflorescence count. We assessed the accuracy of the remote-sensing-derived plant count by comparison with our field mapping results. In a final step, the resulting inflorescences were summed up to individuals per square meter in a Universal Transverse Mercator coordinate system. data acquisition, preprocessing, processing, aggregation, and validation. Data acquisiti includes drone flight and field mapping. The preprocessing covers exploratory data an ysis, subsequent feature engineering, and the creation of a reference dataset for mo training and validation. During processing, we trained a random forest classifier and ev uated feature importance and interactions; we further removed redundant or nonpred tive features from the training dataset, retrained the model with the best-performing of features, and applied the retrained classifier to the drone dataset before we assessed accuracy of the classification results. In the following aggregation, we polygonized clas fied pixel clusters (i.e., pixels of the same class in direct proximity) to vector image obje and applied an object-level filtering approach to remove invalid pixels from the sub quent inflorescence count. We assessed the accuracy of the remote-sensing-derived pla count by comparison with our field mapping results. In a final step, the resulting inflor cences were summed up to individuals per square meter in a Universal Transverse M cator coordinate system.

Drone Data
Aerial images were taken during the flowering phase of DM on 6 July 2021, using a Wingtra One drone [25]. The weather during the flight was calm and consistently overcast. The overcast weather ensured a stable source of illumination during flight, thereby limiting the spectral variability between different images. The drone was equipped with a MicaSense Altum multispectral camera with spectral bands in the blue, green, red, red-edge, and near-infrared regions(refer to [26] for band designation). The camera had a focal length of 8 mm and a field of view of 48 • × 37 • . Each band captured data with 3.2 megapixels resulting in an image size of 2046 × 1544 pixels. The mean flight altitude was 150 m. Flight planning and configuration were conducted using the proprietary mission control software WingtraPilot. The raw image data were processed to surface reflectance with the Pix4D mapper software version 4.6.4 by the commissioned company. In total, 3695 images were processed to a single orthomosaic with a spatial resolution of 3.4 cm.

In Situ Data
On the day of the drone flight, we conducted in situ measurements to collect validation data for the remote sensing plant count. The in situ plant counts followed an established methodology in ecology [1,27], i.e., we used a 1 m 2 frame, placed it randomly in the study site, and counted all plants of the target species within the square twice. We then took a top-down photo for reference and defined the center location of the square using a Global Positioning System (GPS) device (Garmin fēnix 5). In total, we performed 10 in situ plant counts at the study site. Additionally, we measured the inflorescence diameter of some randomly selected plants to approximate the average spatial coverage of a DM inflorescence. This measure was used for comparison with the spatial resolution of the drone data. The measurements showed that the average inflorescence diameter of DM was approximately equal to or slightly smaller than the spatial resolution of the drone dataset.

Labeling a Reference Dataset for Model Training and Validation
To identify DM, we created a reference dataset containing a DM-positive and a DMnegative class by applying a split sampling strategy. For the DM-positive class, we selected 2000 pixels on the basis of a visual inspection of the drone data. For the DM-negative class, we applied a pseudo-random sampling approach recommended by [28]. We undersampled the DM negative class since it made up the majority of the pixels. We used the QGIS "random points in extent" function to randomly select 2000 points in the study site and subsequently sample the underlying pixel values. We manually labeled these pixels by visual interpretation and removed all DM-positive pixels. The missing pixels were iteratively replaced by new randomly selected pixels until all pixels belonged to the DM-negative class. In summary, the reference dataset was balanced and consisted of 4000 pixels (2000 pixels for each class). Subsequently, we split the reference dataset into a training and holdout dataset for model training and independent evaluation (refer to Section 2.7).
Within the scope of image classification, a pixel can hold different values, i.e., features. In this study, the list of features included the spectral reflectances of the available drone bands and a series of vegetation indices we calculated for the subsequent analysis (see Table 1). All vegetation indices incorporated in the subsequent random forest classification routine except for the MaVI are listed in Table 1. We describe the ideas and implementation of the MaVI in detail in the next section. Table 1. Vegetation indices considered for this study's random forest classification. Abbreviations: B = blue band; G = green band; R = red band; NIR = near-infrared band.

Magenta Vegetation Index-Main Ideas and Practical Implementation
We analyzed the spectral signatures of different land-cover types in the Lehmkuhlen reservoir to identify spectral characteristics of magenta-colored vegetation ( Figure 3). We found that magenta-colored flowers tended to have relatively high reflectance values in the blue and red bands, while green vegetation showed the characteristic green peak. We identified the highest potential to differentiate magenta-colored vegetation, soil and water in the NIR. The reflectance spectra of the shallow and muddy water puddles remained approximately constant between 3% and 4% over the entire spectrum. Vegetation showed a sharp increase in reflectance from the red to NIR, while the reflectance curve of bare soil steadily increased with increasing wavelength.
We, therefore, propose the magenta vegetation index (MaVI) as follows: where B, G, R, and NIR are the spectral bands of the sensor in the blue, green, red, and NIR regions of the electromagnetic spectrum. The idea of the index is based on the portion of magenta (defined as the sum of the reflectance values of the blue and red bands) in the visible (VIS) bands. By subtracting the reflectance in the green band from the magenta value, the first term forces pixels with a pronounced green peak to be negative. The spectral response of soil and water surfaces, however, might result in similar positive index values as magenta flowers since the ratios of VIS bands are comparable. To increase the separability, we introduced two scaling factors. By subtracting the VIS/NIR ratio from 1, MaVI values of water surfaces become negative. Bare soil and magenta-colored flowers are both scaled down by the first scaling factor, but the scaling effect is more pronounced for soil surfaces since its VIS/NIR ratio results in higher values compared to the magenta flowers. The second scaling factor highlights the red edge of vegetation scaled by its NIR reflectance.
of magenta (defined as the sum of the reflectance values of the blue and red bands) in the visible (VIS) bands. By subtracting the reflectance in the green band from the magenta value, the first term forces pixels with a pronounced green peak to be negative. The spectral response of soil and water surfaces, however, might result in similar positive index values as magenta flowers since the ratios of VIS bands are comparable. To increase the separability, we introduced two scaling factors. By subtracting the VIS/NIR ratio from 1, MaVI values of water surfaces become negative. Bare soil and magenta-colored flowers are both scaled down by the first scaling factor, but the scaling effect is more pronounced for soil surfaces since its VIS/NIR ratio results in higher values compared to the magenta flowers. The second scaling factor highlights the red edge of vegetation scaled by its NIR reflectance.

Random Forest Classification
Random forest is an ensemble classifier proposed by [41] that combines the prediction of multiple decision trees via a majority vote to a single class assignment. For this study, we utilized scikit learn's implementation of a random forest classifier [42]. We partly adopted the suggested set of parameters for a random forest model by [43]. We set the number of decision trees in the forest (n = 500) and the maximum number of features to consider when splitting a node to the square root of the total number of features available (maxall ≈ 5; maxselected ≈ 3). To prevent overfitting and reduce computational time, we set the maximum depth of a decision tree to 5 as a model constraint. For model training, we randomly split the reference dataset into a training (50%) and holdout dataset (50%), applied a fivefold cross-validation scheme on the training dataset, and performed the ac-

Random Forest Classification
Random forest is an ensemble classifier proposed by [41] that combines the prediction of multiple decision trees via a majority vote to a single class assignment. For this study, we utilized scikit learn's implementation of a random forest classifier [42]. We partly adopted the suggested set of parameters for a random forest model by [43]. We set the number of decision trees in the forest (n = 500) and the maximum number of features to consider when splitting a node to the square root of the total number of features available (max all ≈ 5; max selected ≈ 3). To prevent overfitting and reduce computational time, we set the maximum depth of a decision tree to 5 as a model constraint. For model training, we randomly split the reference dataset into a training (50%) and holdout dataset (50%), applied a fivefold cross-validation scheme on the training dataset, and performed the accuracy assessment on the validation dataset using accuracy metrics derived from a confusion matrix such as precision, recall, and F1-score. We further performed a qualitative validation by visually comparing the classification results and the original drone dataset.

Feature Selection and Model Interpretation
In this study, we employed the concept of SHAP (Shapley additive explanations) values to fairly quantify the contribution of a feature to model predictions [44]. Using SHAP, we assessed the predictive capabilities of the given features to classify magentacolored flowers (i.e., DM). The concept behind SHAP originated from Lloyd Shapley's work on game theory [45] to assess the contribution of individual players, i.e., the dataset features, to a cooperative game, i.e., the model prediction. For the classification of each pixel (i.e., the model prediction), the marginal contribution of a feature (i.e., the SHAP value) is calculated by the weighted average of changes in model predictions for all possible feature permutations of a given dataset. We utilized the TreeExplainer [46] of the SHAP Python package to estimate the SHAP values of our model predictions. The model "payout" was the probability that a pixel was assigned to the DM-positive class. Using the classification test dataset, we created a SHAP beeswarm plot to visualize the global feature importance and to relate the importance ranking to the distribution of a feature. Based on this analysis we removed nonpredictive features from the training data and retrained the random forest model with the reduced dataset. After retraining, the model underwent a final classification accuracy assessment as described in the previous section. With the differences in accuracy measures before and after feature selection, we assessed the effect of the feature selection on the quality of the classification. Additionally, we created a change detection raster to quantitively derive and visualize the changes in pixel class assignments due to the feature selection.

Remote Sensing Plant Count Methodology
The goal of our study was to develop a remote-sensing-based DM mapping with results comparable to conventional field mappings, i.e., the number of plants per unit area. For this, we used a zonal statistics utility of pixel count per unit area, i.e., 1 m 2 . To define the areal ratio between a DM inflorescence and a single image pixel, we assumed that one classified DM positive pixel represented one DM inflorescence (see Section 2.4).
We designed two separate zonal statistic aggregations: one for assessing the accuracy of the remote sensing plant count and another as a proposal for a remote sensing product for practical DM monitoring. For assessing the remote sensing plant count accuracy, we created a square buffer with a side length of 1 m around each GPS-logged in situ plant count (subsequently named the reference squares). We then counted all pixels classified as DM-positive in different counting settings, which we applied to optimize the remote sensing plant count against the in situ plant counts. The baseline setting simply counted all DM-positive pixels in each reference square. In all other counting settings, we applied a three-step filter approach with alternating thresholds before the remote sensing plant count. The filter approach was structured as follows: 1.
Calculate a filter threshold for each image object on the basis of the most descriptive feature of the image classification; 3.
Remove all pixels below the threshold from the remote sensing plant count.
We tested different filter thresholds by calculating object-level percentiles in 10% steps starting at the 10% percentile and ending at the 90% percentile. Additionally, we tested the object-level mean as a filter. The quality of the remote sensing plant counts was determined by calculating the root-mean-square error (RMSE) between the remote sensing plant counts and the in situ plant counts. The best counting setting was defined by the smallest error value.
For practical DM monitoring, we applied a spatial aggregation approach. Contrary to the randomly chosen reference squares of our field campaign, we created a Universal Transverse Mercator polygon grid (EPSG: 32632) overlaying the study site, with the size of each grid cell being 1 m 2 . The plant counting process remained the same as for the best remote sensing plant count counting setting.

Ambiguity in the Drone Dataset
Difficulties arose during image interpretation as identifying the DM-positive class was shown to be ambiguous in some cases ( Figure 4). Although, in theory, the inflorescence area of the target species is equal to or smaller than the spatial resolution of the drone dataset, in practice, multiple neighboring pixels may appear magenta-colored. Several factors may be responsible for this phenomenon:

•
Mixed pixel phenomena, due to (1) a DM individual located at the common boundary of multiple pixels, (2) multiple DM individuals in direct proximity and partly occupy multiple neighboring pixels, or (3) DM individuals which did not grow perfectly straight and, therefore, appeared in neighboring pixels; • Adjacency effects, i.e., the magenta flowers spectrally superimpose the neighboring pixels; • Motion blur caused by camera movement during exposure; • Keystone effect of the camera, which may cause a slight cross-track displacement.
positive labeling, the ambiguity problem is a minor advantage for avoiding false-negative labeling in the reference data. A pixel cluster clearly indicates the presence of magentacolored vegetation; in contrast to pixel-based classifications, false negative labeling, therefore, is unlikely. For the DM-positive class, the reference dataset represented a characteristic value range of the data features, and some ambiguous pixels would most likely fall within these characteristic value ranges since pixels affected by the ambiguity problem still represented the underlying band relationship of magenta-colored vegetation. Since we had no details about the camera calibration and were missing flight details of the drone survey, an in-depth discussion about potential influence factors seemed to be of limited use. We, therefore, decided to cope with the present data quality, which may lead to misclassification [47,48]. To minimize the number of false-positive labeled pixels in the reference data, we only used the purest magenta-colored pixel of a pixel cluster (i.e., the pixel with the highest surface reflectance in the blue and red bands). We assumed that these pixels most likely represented a DM individual. In contrast to the effect on false-positive labeling, the ambiguity problem is a minor advantage for avoiding false-negative labeling in the reference data. A pixel cluster clearly indicates the presence of magenta-colored vegetation; in contrast to pixel-based classifications, false negative labeling, therefore, is unlikely. For the DM-positive class, the reference dataset represented a characteristic value range of the data features, and some ambiguous pixels would most likely fall within these characteristic value ranges since pixels affected by the ambiguity problem still represented the underlying band relationship of magenta-colored vegetation.

Classification Results before Feature Selection
The accuracy assessment of the classification before the feature selection resulted in very high accuracy scores on the holdout dataset. We can report a precision score of 0.99, a recall score of 0.99, and an F1-score of 0.99. The accuracy scores suggest a nearly perfect differentiation of DM-positive and DM-negative classes. However, the high accuracy scores may be inflated due to this study's sampling design. The sampling design for creating a labeled reference dataset, from which we derived the holdout dataset, was partially probabilistic and partially systematic, which may have inflated accuracy scores due to sample selection bias [48,49]. Moreover, the methodological drawback might harm the generalization potential of the accuracy assessment, since the holdout dataset does not necessarily represent the distribution of the underlying population [49]. A coping strategy for the latter would have been to add more labeled pixels to the reference dataset. However, we avoided implementing this strategy since the ambiguity problem limited our labeling capabilities. Additionally, adding more reference data would not have solved the sample selection bias problem, but may have even worsened it. For the quality assessment, however, we suggest that the limitations of the accuracy assessment played a minor role.
Due to the limitations of the quantitative accuracy assessment, the qualitative accuracy assessment became more important. The visual comparison suggested a noticeably good performance of the random forest model ( Figure 5). Other than some rare outliers, magenta-colored pixels were reliably assigned to the DM-positive class. However, the results show that the classifier suffered from the ambiguity problem. Although the model was able to correctly assign the negative DM class to edge pixels of ambiguous pixels clusters, the class assignment appeared to be inconsistent. beling capabilities. Additionally, adding more reference data would not have solved the sample selection bias problem, but may have even worsened it. For the quality assessment, however, we suggest that the limitations of the accuracy assessment played a minor role.
Due to the limitations of the quantitative accuracy assessment, the qualitative accuracy assessment became more important. The visual comparison suggested a noticeably good performance of the random forest model ( Figure 5). Other than some rare outliers, magenta-colored pixels were reliably assigned to the DM-positive class. However, the results show that the classifier suffered from the ambiguity problem. Although the model was able to correctly assign the negative DM class to edge pixels of ambiguous pixels clusters, the class assignment appeared to be inconsistent.  In addition to the abovementioned restrictions, we identified leafless tree branches as a source of systematic false assignments of the DM-positive class. Spectra of tree branches showed similar reflectance curves to DM spectra with an almost constant reflectance level in the VIS; the reflectance in the NIR showed a similar steep increase in DM spectra. The high spectral similarities explain the misclassification.

Feature Selection and Predictive Performance of the MaVI
The feature with the highest impact was the MaVI (Figure 6). For the random forest model, the highest MaVI values corresponded to an increased probability of assigning the DM-positive class to a pixel. Furthermore, the small variation around 0 demonstrates the high impact of the MaVI for almost all model predictions. The MaVI's high position in the feature importance ranking and the consistently high impact on model predictions support our hypothesis on the MaVI's capabilities to detect magenta-colored vegetation.  Table 1.

Classification Result after Feature Selection
The accuracy assessment of the classification after feature selection still resulted in very high accuracy scores on the test dataset. We calculated a precision score of 0.99, a recall score of 0.99, and an F1-score of 0.99. Since the accuracy scores were identical for both classifications results, i.e., before and after feature selection, the former interpretation and discussion of the metrics are generally applicable to the latter classification accuracy scores. On the basis of the identical accuracy scores, we further deduce that the feature selection based on SHAP values was successful and that we correctly removed features with low predictive capabilities. To extrapolate the differences in class assignment to the entire drone dataset, we created a change detection raster that compares both classification results. By removing all features of low predictive power, the classification result for the entire dataset changed by a diminutive portion of 0.0002% of the total number of pixels (~73 million). The low percentage changes in the classification results of the entire drone dataset confirm that the dropped features had a negligible impact on the class assignment of the random forest model. We, therefore, propose that the interpretation and discussion of the former qualitative classification result assessment are generally applicable to the classification result of the entire dataset. A visual inspection of regions with class changes ( Figure 5) revealed a slight improvement for ambiguous pixel clusters; some edge pixels of clusters were now assigned to the DM-negative class. It has to be noted, however, that the new class assignment was not consistent, and the ambiguity problem persisted for most of the medium-to large-sized clusters of the DM-positive class. In addition to the MaVI, all features up to rank 7 showed an observable impact on the random forest model to increase its predictive power (index abbreviations are listed in Table 1), i.e., CVI, the red band, the NIR band, the red-edge band, the GARI, and the SAVI. High reflectances in the NIR, red-edge, and red bands were associated with a higher probability of the DM-positive class. The latter turned out to be the key factor to differentiate magenta-and green-colored vegetation since the spectra showed the largest differences between the two land-cover types in this wavelength region. Vegetation indices such as the CVI and the GARI integrated the green and red bands. The CVI highlights the redness of a pixel, whereas the GARI highlights the greenness. Each, therefore, highlights the band relationship in the opposite extreme; higher CVI values and lower GARI values were often associated with an increase in prediction probability toward the DM-positive class. The SAVI increased the probability of a DM-positive class assignment in association with higher index values. We deduce that the model utilized the SAVI mainly for distinguishing between vegetated and nonvegetated pixels since a description of the green and red band relationship for distinguishing between magenta-and green-colored vegetation is missing in the index formula.
On the basis of our analysis of the beeswarm plot, we decided to reduce the model training dataset to the features with an observable impact on increasing the probability of predicting the DM-positive class. In summary, the reduced training dataset consisted of the following features: MaVI, CVI, the red band, the NIR band, the red-edge band, the GARI, and the SAVI.

Classification Result after Feature Selection
The accuracy assessment of the classification after feature selection still resulted in very high accuracy scores on the test dataset. We calculated a precision score of 0.99, a recall score of 0.99, and an F1-score of 0.99. Since the accuracy scores were identical for both classifications results, i.e., before and after feature selection, the former interpretation and discussion of the metrics are generally applicable to the latter classification accuracy scores. On the basis of the identical accuracy scores, we further deduce that the feature selection based on SHAP values was successful and that we correctly removed features with low predictive capabilities. To extrapolate the differences in class assignment to the entire drone dataset, we created a change detection raster that compares both classification results. By removing all features of low predictive power, the classification result for the entire dataset changed by a diminutive portion of 0.0002% of the total number of pixels (~73 million). The low percentage changes in the classification results of the entire drone dataset confirm that the dropped features had a negligible impact on the class assignment of the random forest model. We, therefore, propose that the interpretation and discussion of the former qualitative classification result assessment are generally applicable to the classification result of the entire dataset. A visual inspection of regions with class changes ( Figure 5) revealed a slight improvement for ambiguous pixel clusters; some edge pixels of clusters were now assigned to the DM-negative class. It has to be noted, however, that the new class assignment was not consistent, and the ambiguity problem persisted for most of the medium-to large-sized clusters of the DM-positive class.

Remote Sensing Plant Count Accuracy Assessment
The remote sensing plant count accuracy assessment is summarized in Tables 2 and 3. Counting all DM-positive pixels in the reference squares resulted in the highest error of all count settings (RMSE: 42 individuals per square meter). In nine out of 10 reference squares, the number of DM individuals was severely overestimated compared to the in situ plant count, with a mean relative overestimation of 76%. In the remaining reference square, the in situ plant count was underestimated by 17% by the baseline setting. We identified the ambiguity problem as the main cause of overestimation by comparing the classification result and the underlying drone dataset in the corresponding reference squares. Counting the pixels in a unit area is a direct aggregation of the classification results and, therefore, inherits the inflated number of DM-positive pixels in pixel clusters. Table 2. Remote sensing plant count after applying different filter settings to the classification result.

≥90%
Percentile Mean Filter The overestimation outweighed the underestimation in terms of both the number of cases and the magnitude, and it accounted for the majority of the high error values of the baseline setting. We, therefore, applied an object-level threshold-based filter before counting to improve the error metric. We received the highest improvement by applying a median filter with an RMSE of 12 individuals per square meter, which is within the error margin stated by experts for a conventional plant survey. A similar error metric was achieved by an object-level mean filter with an RMSE of 13 individuals.

Assessing the Spatial Distribution and Abundance of Dactylorhiza majalis
In the open fen west of the linear trench, DM is widespread (Figure 7). We observed a north-south gradient of the DM population in terms of both spatial distribution and plant abundance. In the northern region of the open fen, DM formed large connected clusters with remote sensing plant counts ranging from five to over 100 DM individuals per square meter, whereas a remote sensing plant count per square meter lower than 50 predominated. However, multiple DM hotspots (remote sensing plant count per square meter >50) were present in the northern area of the open fen, noticeably forming adjacent clusters. In this part, we derived a maximum number of 164 DM individuals per square meter. In the southern part of the open fen, the size of connected DM clusters and the magnitude of DM individuals per square meter decreased. DM clusters were more sparsely spread than in the northern region. The remote sensing plant count ranged from five to 60 individuals, whereas the lower remote sensing plant count numbers predominated. The upper end of the remote sensing plant count range only occurred in a single square.
East of the beforementioned trench, only a single DM cluster with a remote sensing plant count ranging between five and 30 individuals per square meter existed. In the area which was subject to the forest fen removal in 2011, four squares above the lower display threshold existed. Two of these squares showed calibration targets for the drone during flight. For the remaining two squares, we were unable to derive a reliable explanation. From a visual inspection, the drone data indicated no DM pixels in the area. Considering the direct neighborhood of the squares, this observation is supported by the fact that the area was surrounded by water puddles and the squares were located at a considerable distance from the boundary of the former forest fen and the open fen. In particular, the latter indicated the absence of DM. The literature suggests that the majority of the DM diaspore is spread in direct proximity to its source, and that DM growth depends on the presence of mycorrhiza fungi in the soil [1], which limits the speed and distance of population spread. The MaVI, however, showed comparatively high index values, which may have been caused by leafless tree branches in these squares. However, since we are lacking in situ data for these specific locations, we are unable to exclude the possibility (although unlikely from an ecological perspective) that the DM population advanced toward the former forest fen area. diaspore is spread in direct proximity to its source, and that DM growth depends on the presence of mycorrhiza fungi in the soil [1], which limits the speed and distance of population spread. The MaVI, however, showed comparatively high index values, which may have been caused by leafless tree branches in these squares. However, since we are lacking in situ data for these specific locations, we are unable to exclude the possibility (although unlikely from an ecological perspective) that the DM population advanced toward the former forest fen area.

Relevance to Nature Conservation and Management
The study presented an approach to map DM abundance as an effective way of communicating results from a remote sensing-based analysis to a conservationist audience. Using drone data, conservationists can evaluate the success of conservation measures by introducing a comprehensive spatial perspective to a snapshot of the population development of DM. By aggregating plant individuals to a referenced grid, we paid special attention to assure reproducibility and extensibility of the presented approach. Therefore, the resulting map can be regarded as an initial state for long-term monitoring. By conducting the same analysis in subsequent years, an objective and spatially precise development of the DM population in our study site is possible.
In addition to the methodological advantages presented, using drones for plant surveys brings another benefit for nature conservation, i.e., it is possible to avoid plant damage caused by trampling. The surveyor only needs to enter the habitat to lay out or remove the calibration targets for the drone flight, which are generally located at the edges of an area of interest.

Conclusions
In this study, we developed and examined a drone-based approach to estimate the spatial distribution and abundance of DM in the Lehmkuhlen reservoir using very-highspatial-resolution drone data. The results emphasize that our approach could produce valuable data on the status of a DM population during its flowering phase by highlighting the unique spectral response of magenta-colored vegetation. We integrated the spectral characteristics in our newly developed MaVI. A SHAP feature importance analysis of a random forest model demonstrated the strong performance of MaVI in identifying DM. In addition to MaVI, the most suitable features were the NIR/red and the green/red band combinations. We, therefore, recommend integrating the MaVI, the CVI, and the GARI to reliably classify the presence of DM. However, in this study, transferring the classification result to a remote sensing plant count given the available data was limited due to the presence of image artefacts which we summarized under the term ambiguity problem. We tried to cope with the ambiguity problem by optimizing the remote sensing plant count against in situ plant counts via the application of a post-classification median filter on an image object level to reduce the RMSE. The error metrics indicated a noticeable improvement, while a visual inspection of the filtered classification results revealed that the ambiguity problem persisted. Consequently, the ability of this approach to accurately estimate plant counts is limited by its underlying assumption that one pixel indicates one plant individual. Nevertheless, our approach can supplement monitoring programs with information on plant count with acceptable accuracy to address data scarcity. Additionally, our approach can extend the DM monitoring by assessing the spatial distribution of a plant population, representing a step forward from simple plant counts to indicate the success of conservation measures.