Multitemporal Classification of River Floodplain Vegetation Using Time Series of UAV Images

The functions of river floodplains often conflict spatially, for example, water conveyance during peak discharge and diverse riparian ecology. Such functions are often associated with floodplain vegetation. Frequent monitoring of floodplain land cover is necessary to capture the dynamics of this vegetation. However, low classification accuracies are found with existing methods, especially for relatively similar vegetation types, such as grassland and herbaceous vegetation. Unmanned aerial vehicle (UAV) imagery has great potential to improve the classification of these vegetation types owing to its high spatial resolution and flexibility in image acquisition timing. This study aimed to evaluate the increase in classification accuracy obtained using multitemporal UAV images versus single time step data on floodplain land cover classification and to assess the effect of varying the number and timing of imagery acquisition moments. We obtained a dataset of multitemporal UAV imagery and field reference observations and applied object-based Random Forest classification (RF) to data of different time step combinations. High overall accuracies (OA) exceeding 90% were found for the RF of floodplain land cover, with six vegetation classes and four non-vegetation classes. Using two or more time steps compared with a single time step increased the OA from 96.9% to 99.3%. The user’s accuracies of the classes with large similarity, such as natural grassland and herbaceous vegetation, also exceeded 90%. The combination of imagery from June and September resulted in the highest OA (98%) for two time steps. Our method is a practical and highly accurate solution for monitoring areas of a few square kilometres. For large-scale monitoring of floodplains, the same method can be used, but with data from airborne platforms covering larger extents.


Introduction
River floodplains have a rich array of functions that often conflict spatially, such as water conveyance and storage during peak discharge, riparian ecology, agriculture, and recreation [1,2]. These functions are often associated with the floodplain's land cover, more specifically, its vegetation. Floodplain vegetation cover is highly dynamic and heterogeneous in space, reflecting its sharp gradients in environmental conditions, such as sediment type, soil moisture and the flood frequency induced by the river. The high spatial and temporal variation in these conditions result in patchy vegetation with seasonal development and new arrangements from year to year on a scale of a few metres. Vegetation height and greenness are important characteristics in floodplain vegetation which vary strongly over time and between vegetation types [3]. Vegetation height is an important parameter for discriminating vegetation types and for vegetation roughness parametrization in flood models. Vegetation greenness is an important indicator of the chlorophyll activity in the leaves of the vegetation; it varies strongly over the growing season, but also between vegetation types. Vegetation greenness is, therefore, an interesting characteristic to discriminate vegetation types by. Frequent monitoring of floodplain land cover is necessary to capture the dynamics of floodplain vegetation and its related functions.
Monitoring of floodplain land cover is commonly performed by making maps which are generally obtained from remote sensing (RS) data. Detailed floodplain vegetation maps are commonly produced by manually delineating and identifying vegetation objects using remote sensing data and visual interpretation using a classification tree, e.g., for the Mississippi River [4], Murray-Darling Basin [5], the Rhine Delta [6] and River Meuse [7]. However, low user's accuracies have been reported for production meadow, natural grasslands and herbaceous vegetation, whether they are obtained with traditional airborne (user's accuracy (UA) = 38-74%), high resolution spaceborne (57-75%) or Light Detection And Ranging (LiDAR) remote sensing (57-73%) [8][9][10][11]. Even though vegetation height is an important discriminator for floodplain vegetation types, this information is not available from traditional airborne and spaceborne imagery and is difficult to obtain with an error less than 15 cm with airborne LiDAR for low vegetation types [12]. Traditionally, the classification of floodplain vegetation has mainly been performed with data sets collected at a single moment in time [13]. Several studies have shown that the use of multitemporal data increases the classification accuracy, owing to the different degrees of variation in the spectral characteristics of vegetation types through the seasons [14][15][16][17][18]. These studies were carried out with satellite data, because of the relatively low costs of obtaining imagery at a high frequency for large areas, compared with airborne imagery. However, for semi-natural riverine landscapes, which are dominant in Europe, the low spatial resolution of the spaceborne data limits the ability to differentiate between floodplain vegetation types. Furthermore, while multitemporal data might improve the accuracy of floodplain vegetation maps, it is still unclear how many and which moments during a year are optimal for the acquisition of monitoring data for floodplain land-cover mapping. For example, Langley et. al. [19] found that time series of spectral data do not always improve the grassland classification accuracy, due to similarities between vegetation types at certain time steps or the effect of shadow during low sun angles in winter.
In recent years, the increased availability of unmanned aerial vehicles (UAVs) has allowed low-cost production of very high-resolution orthophotos and digital surface models (DSMs) [20]. These data have high potential for floodplain vegetation monitoring, because they enable frequent production of detailed greenness and vegetation height maps. Over time, height and greenness attributes may improve the classification accuracy of floodplain vegetation patches, since these characteristics change over the seasons in specific ways for different vegetation types [3]. An object-based classification of riparian forests based on UAV imagery also indicated spectral and vertical structure variables as being the most discriminating [21]. Time series of UAV images have mostly been used for elevation change mapping studies, reporting vertical accuracies of 5-8 cm for bare surfaces [22][23][24][25]. To monitor the crop height of barley, Bendig et al. [26] reported a vertical error of 0.25 m and for natural vegetation UAV-derived DSMs can be used to estimate the vegetation height with an error of 0.17-0.33 m [3].
A downside of a multitemporal dataset is the large number of variables available for classification, of which only a small fraction is likely to contribute significantly. Consequently, a classification model may tend to consider the less important variables (noise) as well for its predictions, which makes it sensitive to overfitting. Model complexity and the number of available variables are important components of a classification procedure to affect overfitting. A complex classification model, in which a large number of variables are available for the prediction, may be more prone to overfitting, because there is a higher possibility it will include noise variables in addition to high value variables. Different levels of model complexity can be achieved by varying the number of variables the model can include. Overfitting may also arise when using a large dataset with many variables. Further testing is required to determine whether the number or type of variables used in the model can be optimized in addition to the number of time steps considered.
The manual identification of map units (objects) is a time-consuming method, which may be bypassed by using a semi-automatic, object-based approach, in which a segmentation algorithm groups the pixels into homogeneous objects. These objects are then classified based on the spectral signal of the object, as well as the shape and context characteristics of the object. By using the signal of the object instead of the pixel, the signal is amplified and, additionally, allows for the use of variation within the object as an attribute [27]. Thus, combining multitemporal, UAV-derived, high-resolution imagery with object-based classification techniques has the potential to offer a major step forward in the automated mapping and monitoring of complex and dynamic floodplain vegetation in high detail.
The aims of this study were (1) to document floodplain land cover based on the classification of multitemporal UAV-derived imagery with an object-based approach, focusing on low vegetation; (2) to evaluate the effect of varying the number and timing of acquisition moments on the classification accuracy of multitemporal input data versus single time step data; and (3) to quantify the effect of different levels of complexity of the dataset and classification model on the classification accuracy.

Study Area
We performed our classification study on the embanked Breemwaard floodplain ( Figure 1C). The area of 116 ha is located on the Southern bank of the river Waal ( Figure 1A), which is the largest distributary of the river Rhine in the Netherlands. Typical vegetation in the Breemwaard includes pioneer vegetation (e.g., scutch grass, tansy, narrow-leaved ragwort, and red fescue), natural grassland (e.g., creeping cinquefoil, English ryegrass, red clover, and yarrow), production grassland (e.g., English ryegrass and orchard grass), herbaceous vegetation (e.g., narrowleaf plantain, common nettle, dewberry, water mint, creeping cinquefoil, and black mustard) reed and riparian woodland (e.g., white willow, black poplar, and alder) ( Figure 1B). Different management practices, such as grazing and mowing, have resulted in a heterogenous distribution of these vegetation types. Along the banks of the Waal bare sand, protection rock/rubble of groynes and pioneer vegetation are the main surface cover. The four water bodies are the result of clay mining in the past. For a more detailed description of the floodplain and its history, see Peters et al. [28].

Reference Data
Reference data, representing homogeneous vegetation units, were collected in the field and supplemented by digitized areas from orthophotos. Low floodplain vegetation (<3 m) is best identified in the field. Therefore, 28 field plots (dimensions approximately 15 × 15 m) were selected in February 2015 based on a range of different vegetation densities [3]. In September, their vegetation types were determined based on the average height and dominant species in the plot, e.g., pioneer vegetation, natural grassland, production grassland, herbaceous vegetation, and reed ( Figure 1B). Because of the dormant state of the vegetation in February, the initial field plots were, in some cases, quite heterogeneous regarding their vegetation type. At the end of the growing season in September, some plots contained a mixture of two vegetation types, of which an example can be seen in Figure 2. Field sketches of the plots in September were used to assign classes to different homogeneous parts (referred to as polygons) within the plots ( Figure S1). The field reference polygons, including the intact homogeneous field plots, were supplemented by polygons visually identified in the UAV imagery to account for under-or unrepresented land-cover types in the field data, i.e., pioneer, reed, trees, water, bare sand, sealed road, and rocks/rubble. These visually identified polygons are delineated with a dotted circle in Figure 1 and vary in shape to maintain homogeneity, depending on the vegetation type. For example, the polygons on unpaved walking paths with pioneer vegetation or sealed roads required a more elongated shape to maintain class purity. In total, we obtained 86 reference polygons. Field impressions of each land cover class, with winter and summer conditions for the vegetation classes, can be found in the Supplementary Information (Figure S2).
The ratio of the summed area of the reference polygons per class over the total area of reference polygons was chosen such that it was similar to the ratio of estimated surface area of each class in the study area. The total area of the reference polygons was largest for the natural grassland and herbaceous vegetation classes ( Table 1). The polygon sizes of pioneer vegetation and natural grassland were smaller than those of other vegetation classes, due to their large heterogeneity. The sealed road and rubble classes also had relatively small polygons, because these were small or narrow continuous surfaces in the study area.0  Survey data were collected as described in Van Iersel et al. [3]. In brief, the UAV imagery was collected at six moments during one growing season: February, April, June, September, and November 2015, and January 2016. The Swinglet CAM by Sensefly [29] was used as the platform for all flights, except for June when we used a Sensefly eBee because of repair work on the Swinglet. A Canon IXUS 125 HS was used to acquire true colour (RGB) imagery and a modified Canon IXUS 125 HS for false colour (CIR) imagery in two separate flights, because the UAV could only carry a single camera. The surveys were conducted between 7 a.m. and 6 p.m., depending on the sunrise and sunset timing which differed over the season, on days with low wind speeds and constant weather conditions. In total, 38 white vinyl markers, supplemented by five street lanterns and road markings, were used as ground control points (GCPs) for georeferencing. The obtained imagery was processed with Surface-from-Motion (SfM) in Agisoft Professional [30] to obtain point cloud DSMs and orthophoto mosaics of the study area [3]. Per survey, we obtained a true colour orthophoto (ortho RGB ), a true colour DSM (DSM RGB ), a colour infrared orthophoto (ortho CIR ), and a colour infrared DSM (DSM CIR ), resulting in 12 orthophotos and 12 DSMs. The orthophotos were raster formatted, whereas the DSMs consisted of point clouds. Agisoft reported errors of less than 0.1 m in both the XY and Z directions for its produced orthophotos and DSMs.

DSM and Orthophoto Processing
The DSMs represented the elevation of the floodplain surface model above sea level, and not the vegetation height. The height of each DSM point above the terrain surface (nDSM point ) was calculated as the difference from a LIDAR bare earth model [31]. The terrain surface was based on the LiDAR-based digital terrain model (DTM) of the Netherlands, which has a vertical accuracy of 5 cm for bare earth. The nDSMs point were rasterized to a 25-cm grid (nDSMs raster ), to use them for segmentation in the eCognition Developer software [32]. To match the orthophotos' resolution, the point nDSM was resampled into a 5-cm grid using nearest neighbour assignment. Large water areas were manually masked in the nDSMs raster because of outliers in the dense point clouds due to matching errors in the SfM method with the continuously changing water surface.
The ortho CIR were used to calculate a vegetation index for each time step. The available spectral channels of the CIR camera (blue, green and near infrared) did not allow the commonly used NDVI based on red and near infrared to be calculated. Therefore, we used the blue channel instead of red and calculated a 'consumer-grade camera' vegetation index (CGCVI) in accordance with Rasmussen et al. and Van Iersel et al. [3,33]. A CGCVI layer was calculated for each time step, resulting in six additional layers. After processing, three types of raster layers were obtained per time step: (1) single band raster nDSMs; (2) three-band raster orthophotos; and (3) a single band raster CGCVI image. In total, 54 layers were available for multitemporal segmentation and classification: six nDSMs RGB,raster , six nDSMs CIR,raster , six ortho RGB with three channels, six ortho CIR with three channels and six CGCVI (Figure 3; input).

Methods
To achieve the three research aims, we performed the analysis in three main steps ( Figure 3). First, we used the dataset of multitemporal UAV imagery and reference data to evaluate the classification of floodplain land cover with an object-based approach. For the object-based approach, segmentation was required, which is the process of grouping pixels of the image together into image objects or segments [34], followed by classification of these segments during which, land-cover classes are assigned to the objects. The raster layers were segmented and the obtained objects were classified using a Random Forest (RF) classifier. Second, this process was repeated, leaving out the image data from the time step that contributed the least to the overall classification accuracy until only the data from one time step remained. In this way, we evaluated the effect of varying the number and timing of acquisition moments of multitemporal input data versus single time step data. Third, to quantify the effect of the different levels of complexity of the dataset and of the classification model on the classification accuracy, the classification over a step-wise reduction of time steps was repeated with a less complex RF and with a reduced input dataset.

Object-Based Multitemporal Classification and Validation
To perform an object-based image classification, the 54 raster layers were first segmented. The obtained segments were assigned 108 spectral and height attributes, which were calculated from the 54 raster layers (i.e., 54 mean & 54 standard deviation (SD)). Second, we performed an RF classification to select the most discerning attributes of the objects. In a third step, a new image segmentation was carried out, now using the layers on which these most important attributes were based and independent of earlier obtained segments. Fourth, this second segmentation was again classified with an RF to check the classification accuracy and to determine the attributes from which time step contributed least to the classification accuracy. This processing loop was repeated five times, each time leaving out the time step that contributed least to the classification (Figure 3).

Segmentation
Many object-based classifications have shown superior results over pixel-based landscape classification [34]. In vegetation classification, the within-area statistics of the pixels making up the object are an interesting feature, because some vegetation objects, such as a patch of herbaceous vegetation, are expected to show a relatively large internal variation in colour and height compared with smoother grassland objects. For this reason, the layers were segmented into objects before classification, using eCognition Developer [32]. To compare our vegetation objects from the imagery with the field reference data, an object size less than the field-plot size was necessary, which was achieved with a scale parameter setting of 60 for all 54 layers. The shape and compactness were both set to 0.1. An object was selected for the training or validation of the classification if it overlapped at least 50% with a reference polygon. If it overlapped with more than one polygon, it was assigned the class with which it shared at least 50% of its surface area. The attributes per object were (1) class; (2) the X and Y coordinates of the object centre; (3) the mean value of each layer; and (4) the standard deviation of each layer.

Variable Selection and Second Segmentation
All reference objects obtained from the segmentation were divided into two sets of equal size by splitting them alternately per class based on their X-coordinate into a training and validation dataset. This was done to ensure an equal number of objects per class in both the training and validation sets and to make sure both sets represented the variation in the entire study area. For classes with more than 1000 objects, a random sample of 1000 objects was taken first, which was then divided into the training and validation sets, as described above. All classes except sealed road and rock/rubble had more than 1000 objects. The function varSelRF in R [35] was used to select the most discerning attributes of the multitemporal data set. During this selection step, all attributes were included, thus variables of all time steps could be selected. VarSelRF iterates over multiple RF classifications, dropping variables until the out-of-the-bag-error (OOB) becomes larger than the initial OOB error until only the relevant variables remain. Although the varSelRF is also a classifier, here, it was only used for attribute selection. A new segmentation was then performed in eCognition to obtain segments based on only the layers of these selected attributes. The objects of the varSelRF segmentation were exported as described in the previous section.

Classification
The varSelRF-based objects were again sampled up to 1000 objects and alternately split into training and validation datasets based on their X-coordinate. An RF classifier was built in R based on these samples for each class from the training set and using all 108 attributes. The RF classifier was built with 10,000 trees, and the maximum number of end nodes (maxnodes) was set to 25 to prevent overfitting (RF maxn=25 ). In data sets with a large number of variables, it is likely that only a small fraction of these contribute significantly to a high accuracy level. An unrestrained classification model may tend to consider noise variables as well for its predictions, which could lead to overfitting.

Validation
The classification accuracy of the RF was determined using the independent validation set and was reported as the overall classification accuracy (OA) and Kappa statistic (κ). OA is the percentage of true positives in all evaluated objects and (κ) corrects the OA for the possibility of an agreement occurring by chance. The OA and κ were calculated from the confusion matrix, which accounted for the area of the objects in order to correct for object size. To check for overfitting of the RF model, an additional validation was performed with the training dataset, resulting in an OA train and κ train . For the difference between OA val and OA train ≤ 0.01 percent point (pp), overfitting was assumed negligible. Moreover, the user's accuracy was used to inspect the performance of individual classes from the point of view of a map user, as it shows how often the class on the map will actually be present on the ground. The producer's accuracy indicates how often real objects of this class on the ground correctly show on the classified map.

Classification Accuracy with Step-Wise Decrease in the Number of Time Steps
Since the costs for monitoring are directly related to the number of surveys carried out (i.e., time steps), the relation between the number of monitoring time steps per year and the resulting OA is of high relevance. To determine which moments are essential, the time step contributing the least to the OA (TS least ) was dropped repeatedly from the segmentation and classification, until only one time step remained (Figure 3). To determine the TS least of the total of six time steps, the RF was performed six times, each time leaving out either February, April, June, September, November, or January. TS least was selected from the RF loop with the least decrease or even increase in OA compared to the condition where all time steps were included and the layers and derived attributes of TS least were excluded from further analyses. The next step was to find the TS least of the remaining time steps with the same method, for which we subsequently performed a new segmentation, variable selection, a second segmentation with the optimal layers from five time steps and five times the RF including only four time steps. These steps were repeated until only a single time step remained (Figure 3). To guarantee similar object sizes over the entire work flow, the scale parameter was adjusted in each segmentation to maintain 5 × 10 6 ± 10% segments for the whole study area.

Varying Complexity of the Classification Model and the Data Set
Two additional analyses were performed with the six segmentations that included layers of one to six time steps. This means that the first three blocks in the "Classification" of Figure 3 were skipped to reduce calculation time, but the rest of the method was repeated. First, maxnodes was set to default for the best obtainable RF model (RF de f ault ) for the Breemwaard dataset. With maxnodes set as inactive, the RF can build each tree to its maximum depth [36]. In this case, each tree may consider noise variables as well for its predictions, which increases the risk of overfitting. Second, the RFs were built using only the spectral attributes (RF spectral ) and excluding the nDSM attributes. In this way, the added value of multitemporal data with and without elevation data could be determined. Note that the segments used for this additional analysis were obtained by segmenting both the spectral and nDSM layers. Maxnodes was kept at 25.
The RF classifier with the highest OA was applied to the corresponding layers to classify the entire study area. The result of this classification was visualized by mapping the classified objects of the entire study area, and this thematic map was compared to the orthophoto of September by visual inspection.

Object-Based Multitemporal Classification
From the available 54 layers (nine layers x six time steps), for the first multitemporal segmentation, the varSelRF function selected 23 out of 108 attributes, including (1) only mean attributes; (2) all mean nDSM attributes; and (3) all mean CGCVI attributes. No SD attributes were selected by varSelRF. The second segmentation, performed with only the layers of the varSelRF-selected attributes, resulted in an OA of 0.939 and a κ of 0.926 (Table 2). Table 2. Classification accuracy with step-wise decrease in the number of time steps. The * indicates the time step which adds the least value. This time step is the group of 18 attributes collected for this specific time step and is not used in further analyses. OA is the overall classification accuracy and κ is the Kappa coefficient. The subscript val indicates that validation is based on a validation data set and train is based on a training data set. Bold OA val show the accuracy of the RF with the same time steps used for the segmentation, which are plotted in Figure 4, labelled as 'maxnodes-25'.

Classification Accuracy with
Step-Wise Decrease in the Number of Time Steps Table 2 shows the classification accuracies after subsequently leaving out a time step with maxnodes set to 25. For each of the six segmentations, the classification accuracies are given for every possible n−1 combination of the remaining time steps. The number of time steps included in the segmentation decreases in each row with TS least , which was based on the highest OA val obtained after the time step was omitted (marked by the *). For example, for n = 6, the OA val even increased from 93.9% to 94.1% when February (FEB) was omitted. Moreover, OA val continued to increase until n = 2, even though the differences in OA val were minimal (Figure 4, maxnodes-25). The most important time step is June, because it was the last remaining time step and thus, was never selected as the least contributing. In increasing order of importance, February, January, April, November, and September were shown to be TS least .
All classifications with the RF maxn=25 resulted in high OA val and κ val , both exceeding 90%. The OA and κ were higher for the multitemporal sets than those obtained for the remaining single time step in June (Figure 4). The differences in OA and κ were remarkably small among the different multitemporal sets. The highest accuracy was achieved using the June and September datasets; the OA val was 94.6%, compared with 91.6% for June only. This means that adding September's data to the classification based solely on June increased the classification by 3 percent point (pp), which thus decreased the initial error of 8.4% by about one-third. Overfitting was not an issue in any classification, as OA train -OA val was equal to or less than 1.36 pp ( Table 2). User's accuracies exceeded 85% for all classes except for natural grassland (60-73%) for 1-6 time steps (Table S1). This may be explained by its high within-class variability and confusion between the pioneer and herbaceous vegetation classes, due to structural and spectral similarity.

Varying Complexity of the Classification Model and Data Set
The RF de f ault resulted in significantly higher OAs compared to the less complex RF maxn=25 (p = 0.00001) (Figure 4). We found the highest accuracy when including all six time steps: the OA val was 99.3%, compared to 96.9% for June only. This is an increase in accuracy of 2.4 pp, which equals a 76% decrease in classification error. With RF de f ault , the OA decreased much more gradually when eliminating time steps, compared with the sudden decrease in OA and κ at one time step obtained with RF maxn=25 . User's accuracies exceeded 75% for all classes, even when using one time step, and were larger than 90% for three time steps or more. Overfitting was only a minor issue with RF de f ault , as OA train -OA val stayed below 3 pp. Validation with the training set resulted in an accuracy of 100%.  When the nDSM-related attributes were excluded from the RF classification, the OA was significantly lower (p < 0.01) compared with RF maxn=25 (Figure 4). The OA increased from 86% to 92% by using three or more time steps. However, reed and herbaceous vegetation still had user's accuracies of 51-68%. Error matrices of n = 1 and n = 6 for the three levels of complexity can be found in the Supplementary Information (Table S2).

Classified Land-Cover Map
The natural vegetation types only have small differences in their spectral and structural characteristics, but they were well discriminated by the RF (Table 3). Pioneer vegetation is sometimes confused with bare sand, which is understandable as this sparse vegetation does not fully cover the bare sand. Overall, the classification with RF de f ault matched the vegetation patterns that were observed on the orthophoto at the end of the growing season (September) ( Figure 5A). The production grassland area in the east was well delineated. The detailed alternating patterns of natural grassland and herbaceous vegetation were also excellently mapped. For example, the stripy pattern in the centre of the floodplain is caused by small terrain height differences of approximately 0.4 m, which results in different vegetation types in the lower (grassland) and higher (herbaceous) sections ( Figures 4C and 5B). These terrain height differences are man-made remnants of old forestry methods from the 19th and 20th centuries, so-called rabatten. Along the banks of the water bodies, wet pioneer vegetation or wet bare sand is sometimes classified as rock or sealed road, which is understandable due to its low reflectance and irregular surface that are similar to the rock/rubble class ( Figure 5A).

Discussion
This is the first study to use UAV multi-seasonal data for object-based classification of natural non-woody vegetation at a high spatial resolution, according to the authors' knowledge. High overall accuracies of more than 90% were found, even with a single time step. The high classification accuracies were caused by the additional attributes on vegetation height compared with traditional airborne imagery which reports lower OAs of 75-77% [9][10][11], as well as the powerful RF classifier.

Object-Based Multitemporal Classification
The varSelRF function showed that only the object means of layers were relevant attributes, whereas SD attributes were unimportant. Although the variation within the objects did not show up as an important attribute, the object-based approach is still recommended because of noise reduction. Calculating the mean or another measure of central tendency for an object reduces the effect of extreme pixel values and decreases noise [37]. Hence, object-based classification decreases the spectral confusion of classes.
Weil et al. [38] were able to classify herbaceous patches within in a Mediterranean woody vegetation with a user's accuracy of 97.1% and an OA of 85% with only spectral UAV data. However, their "herbaceous patches" class included all low vegetation, whilst all other classes were specific tree species; in our study, we differentiated several low vegetation classes and still obtained high OAs. This detailed classification of low vegetation types is required to map hydraulic roughness in river flood modelling [39]. In addition, it is the level of detail at which the habitat characteristics of small floodplain fauna such as butterflies need to be monitored [40]. Other studies classifying similar natural vegetation with multitemporal satellite data yielded lower OAs of 80-90%, but also produced maps covering much larger areas [14,16]. The high classification accuracies found with multitemporal UAV imagery are owing to (1) the high resolution combined with an object-based approach [41]; (2) the addition of vegetation height data, which is of major importance when discriminating between classes like grassland, shrub, and herbaceous vegetation [3,9]; and (3) the use of an RF classifier which was also proven to be the most powerful by Weil et al. [38] using multitemporal data.

Required Number of Time Steps
OA and κ are up to 3 pp higher for the multitemporal sets than those of the single time step for June, produced with RF de f ault , RF maxn=25 and RF spectral . Interestingly, for the less complex RF maxn=25 , the differences in OA and κ were minimal between the different multitemporal sets. Weil et al. [38] also found no prominent improvements in OA in their RF classification of woody vegetation after the first three optimal time steps. Most likely, the RF did not produce a sufficiently large majority of trees containing the important variables, due to the pre-set limitation on the number of end nodes. This may be explained by the small difference in the added value to the classification accuracy between the included variables and the high number of variables involved in the classification with more than two time steps. As a result, classification using a larger number of attributes (in this case, based on more time steps) might not necessarily result in higher OA. In this study, this mostly affected the user's accuracies of classes with a high internal variability, such as natural grassland, because the restricted trees in RF maxn=25 do not capture this variability. When only using the June data, important variables were clearly missing in the forest, resulting in a much lower OA val .
The highest OA was found when combining June and September, which may be explained by the fact that June has the moment of maximum photosynthetic activity, and hence, the highest CGCVI, while in September, most vegetation is at its maximum height. Hence, spectral and structural differences between vegetation types are most pronounced in these months, while data from other months become redundant, add noise, and may increase confusion between classes. These findings are similar to what Michez et al. [21] concluded from their multitemporal study classifying riparian forest species. They found that the late vegetation season was the most appropriate time window to perform UAV surveys, because phenological differences between riparian forest species are the most enhanced.

Varying Complexity of the Classification Model and of the Data Set
The RF de f ault had overall higher OAs than RF maxn=25 , because the trees were built to their full extent. It was possible for each training object to have its own end node in these fully grown trees. However, this also increased the risk of using noisy variables to build the tree, potentially causing overfitting on the training data. Therefore, we built a large number of trees (i.e., 10,000) to correct for overfitting [36], and the differences between OA val and OA train of 0.03 or less show that overfitting was indeed minimal [42].
Multitemporal data is also of high value for classifications based only on spectral UAV data. The OA obtained using spectral information increased from 86% when based on a single observation moment to 92% when data acquired during three or more time steps were used. Even with multitemporal data, user's accuracies obtained for reed and herbaceous vegetation were low ( Supplementary Information, Table S1), because without height information, these classes are easily confused with each other or with spectrally similar classes such as natural grassland. Note that the objects used in RF spectral were obtained from segmentation of both nDSM and spectral raster layers.
The less complex RF maxn=25 classifier is sensitive to the number of variables, as using data from more than two time steps did not improve, and even decreased, the OA ( Figure 4). This counterintuitive result is due to the large number of variables, of which many are correlated or not discriminative between classes, which hampers identification of the important variables in the RF procedure. This effect is negligible when a smaller total number of variables per time step is available, as was the case with only spectral attributes of the objects. Moreover, OA improved significantly when using the default tree depth (RF de f ault ) instead of a limited depth (RF maxn=25 ), because the larger number of tree nodes led to more frequent selection of the important variables, hence leading to a better overall RF classification performance.

Training and Validation Sets
The RF performance is also sensitive to the number of classes and the number of objects per class. In a few experimental runs with more classes, for example, quite similar types of herbaceous vegetation, the RF developed a preference towards one of these classes and misclassified the other class, instead of confusing them equally. In the experiment, the class with the highest internal variation was predicted the least, unless its number of objects was increased to better capture the within-class variation by the trees of the RF. Moreover, the RF was susceptible to the resemblance of the training and validation sets. In our study the validation and training objects were obtained from the same reference polygons. Switching the training and validation sets resulted in OAs of the same order of magnitude ( Figure S3), again with June and September being the most important time steps (Table S2) . Surprisingly, if the polygons were first split into either training or validation, the resulting RF would sometimes fit the validation data better than the training data. On the other hand, the structured X-coordinate-based splitting of the reference objects into training and validation sets resulted in high spatial correlation between the sets. Random sampling of the reference objects also resulted in OAs of the same order of magnitude ( Figure S3) with June and September being the most important time steps (Table S3). Most importantly, the improvement in OA with multitemporal input data versus single time step input data was still clear with different sampling methods for the training and validation sets ( Figure S3).

Floodplain Vegetation Map
The highest OA of floodplain vegetation was obtained with UAV imagery of all six monitoring moments and an RF with default settings that built 10,000 trees. The alternations in grassland and herbaceous vegetation were mapped in very high detail. We observed many different patterns in these vegetation types, resulting from topographical or substrate differences, or grazing by cows and horses. Mapping errors still occurred between non-vegetation classes, which gained little benefit from using multitemporal data because they are relatively stable over time. These classes may benefit from a different sensor that is able to detect reflectance in the short-wave spectrum (1700-2300 nm) where the spectral differences between these materials are more pronounced [43]. Nevertheless, to document natural floodplain vegetation, the spectral characteristics used in our study were satisfactory, as we found user's accuracies of 95% and higher.
The improvement in land-cover-map accuracy with our method compared with other floodplain land-cover-classification studies [8][9][10][11] has important consequences for applications in hydrodynamic predictions and biodiversity estimations for floodplains. Straatsma et al. [44] predicted a reduction of 50% in the uncertainty of discharge distribution over the bifurcation points of rivers with an increase in the OA of floodplain land cover from 69% to 95%. We showed that this level of accuracy is now achievable for natural floodplain vegetation with multitemporal UAV data. More detailed maps allow for more detailed management of floodplains, for example, by allowing more complex spatial configurations of vegetation types, while maintaining overall safe conditions during high river discharge. A higher diversity in vegetation structure may be relevant as it is typical of the habitat of floodplain fauna. However, quantitative ecological knowledge on habitat characteristics, such as the spatial configurations of vegetation types, is species-specific. Some studies on small floodplain mammals have described, in detail, the necessary sizes and shapes of vegetation or other land-cover types [45,46], but these are rare. Habitat suitability is often based on expert knowledge which is more difficult to find and standardize for use in floodplain management.

Practical Considerations
For practical application, UAVs allow high-resolution and frequent monitoring areas of a few square kilometres, since image acquisition takes three days for an area of roughly 100 ha (1 day of flying and 2 days for GCP collection). For data collection over much larger areas, this method becomes costly. For example, a UAV survey of the entire Dutch Rhine delta would take 370 km 2 × 3 days = 1110 days, which would require 50 simultaneously working survey teams of 2 people to collect the data within a month (22 working days) and would cost approximately one million euros. For a cost-benefit analysis, it would be interesting to compare these costs to those of traditional airborne imagery acquisition from a plane or helicopter of the same region with a similar resolution of 5 cm and sufficient image overlap of 60% or more.
The above calculation excludes the costs for the LiDAR DTM which was used for normalization of the UAV DSMs. However, this can be obtained once every few years depending on the floodplain topography dynamics. Another possibility might be normalization of the DSMs with a sparsely vegetated winter DSM, which would be fine for the low, non-woody vegetation [3], but the nDSMs would become inaccurate for more woody vegetation as they maintain their height in winter and block the view of the terrain for the UAV.

Conclusions
We collected a large multitemporal data set with high resolution vegetation height and spectral information, which resulted in the availability of 108 variables for the object-oriented classification of floodplain land cover. Using data from six UAV surveys over a growing season and a Random Forest classifier (RF), we obtained overall classification accuracies of up to 99.3% and user's accuracies of at least 95%, even for similar classes, such as natural grassland and herbaceous vegetation. This high accuracy of floodplain land-cover maps has important implications for floodplain management, as it will improve hydrodynamic predictions and biodiversity estimations.
With the RF, we were also able to filter the many variables of the multitemporal dataset in terms of importance for the monitoring of floodplain land cover. With default RF settings, the use of two or more time steps compared with a single time step increased the overall classification accuracy from 96.9% up to 99.3%. The most important time steps were June and September, which is when most floodplain vegetation reaches its maximum values for, respectively, greenness and vegetation height. Additional testing with varying RF complexity and a smaller data set showed that overfitting was not a major problem, because the difference between validation with the training set and independent validation set was less than 3 percent points.
Our method is a practical solution for monitoring areas of a few km 2 . The same method can be used for large-scale monitoring of floodplains, but with data from airborne platforms covering larger extents. Our study can be used to improve the survey setup and planning with these platforms.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/10/7/1144/s1, Figure S1. The nine field plots containing a mixture of two vegetation types, Figure S2. Field impressions of the vegetation types in the Breemwaard study area, Figure S3. Accuracy of the RF classification by decreasing number of time steps for different sampling of the training and validation sets from the reference data, Table S1. Error matrices of classifications with 1) RF maxn=25 and RF de f ault with structural and spectral data and RF maxn=25 with only spectral data for segmentations with six (n = 6) and one (n = 1) time steps, Table S2: Classification accuracy with step-wise decrease in number of time steps with reversed training and validation set in the RandomForest classification, Table S3: Classification accuracy with step-wise decrease in number of time steps with random sampling of training and validation set per class.